My current area of research is the intersection between computational morphology and computational modeling of language acquisition and the transfer of such insights into low-resource NLP.
Related research topics of my interest:
Computational modeling of morphology cross-dialectally.
The impact of linguistically driven design decisions on computational language modeling.
The behavior of modern ML techniques on tasks involving low-resource languages: training and evaluation.
Computational modeling language acquisition.
Corpora mining and resource extraction for low-resource languages.
Gulf Arabic: Modeling Challenges of a Low-Resource Variety
(Khalifa et al 2016)
A collection of +100M words of Gulf Arabic.
The corpus comprises of long conversational novels whose author names are noms de plume.
The corpus is available to browse and to obtain as 5-gram language models (a la Google Ngrams).
(Khalifa et al 2017)
A full morphological analyzer for Emirati Gulf Arabic verbs based on phonetic dictionary entries of Emirati verb lemmas.
A lexicon of +2,400 verb lemmas with their respective English gloss, real root, and templatic pattern.
(Khalifa et al 2018)
A commissioned full morphological annotation of 200,000 words of Emirati Gulf Arabic from eight different novels from Gumar.
A morphological lexicon and analyzer is automatically extracted from the data.
Full Morphological Neural Disambiguation System for Gulf Arabic(Khalifa et al 2020)
A work in progress of benchmarking the first of its kind morphological disambiguator for Gulf Arabic as a low resource dialect of Arabic.
(Obeid et al 2020)
Developed various Arabic morphological modeling modules:
Morphological analysis, generation, and reinflection.