Computational modeling of morphology cross-dialectally.
The impact of linguistically driven design decisions on computational language modeling.
The behavior of modern ML techniques on tasks involving low-resource languages: training and evaluation.
Computational modeling language acquisition.
Corpora mining and resource extraction for low-resource languages.
A collection of +100M words of Gulf Arabic.
The corpus comprises of long conversational novels whose author names are noms de plume.
The corpus is available to browse and to obtain as 5-gram language models (a la Google Ngrams).
A full morphological analyzer for Emirati Gulf Arabic verbs based on phonetic dictionary entries of Emirati verb lemmas.
A lexicon of +2,400 verb lemmas with their respective English gloss, real root, and templatic pattern.
A commissioned full morphological annotation of 200,000 words of Emirati Gulf Arabic from eight different novels from Gumar.
A morphological lexicon and analyzer is automatically extracted from the data.
A work in progress of benchmarking the first of its kind morphological disambiguator for Gulf Arabic as a low resource dialect of Arabic.
Morphological analysis, generation, and reinflection.
Morphological tokenization.
Morphological disambiguation.