Research
My current area of research is the intersection between computational morphology and computational modeling of language acquisition and the transfer of such insights into low-resource NLP.
Related research topics of my interest:
Computational modeling of morphology cross-dialectally.
The impact of linguistically driven design decisions on computational language modeling.
The behavior of modern ML techniques on tasks involving low-resource languages: training and evaluation.
Computational modeling language acquisition.
Corpora mining and resource extraction for low-resource languages.
Selected Projects:
Gulf Arabic: Modeling Challenges of a Low-Resource Variety
(Khalifa et al 2016)
A collection of +100M words of Gulf Arabic.
The corpus comprises of long conversational novels whose author names are noms de plume.
The corpus is available to browse and to obtain as 5-gram language models (a la Google Ngrams).
(Khalifa et al 2017)
A full morphological analyzer for Emirati Gulf Arabic verbs based on phonetic dictionary entries of Emirati verb lemmas.
A lexicon of +2,400 verb lemmas with their respective English gloss, real root, and templatic pattern.
(Khalifa et al 2018)
A commissioned full morphological annotation of 200,000 words of Emirati Gulf Arabic from eight different novels from Gumar.
A morphological lexicon and analyzer is automatically extracted from the data.
Full Morphological Neural Disambiguation System for Gulf Arabic
(Khalifa et al 2020)A work in progress of benchmarking the first of its kind morphological disambiguator for Gulf Arabic as a low resource dialect of Arabic.
(Obeid et al 2020)
Developed various Arabic morphological modeling modules:
Morphological analysis, generation, and reinflection.
Morphological tokenization.
Morphological disambiguation.