My current area of research is the intersection between computational morphology and computational modeling of language acquisition and the transfer of such insights into low-resource NLP.

Related research topics of my interest:

  • Computational modeling of morphology cross-dialectally.

  • The impact of linguistically driven design decisions on computational language modeling.

  • The behavior of modern ML techniques on tasks involving low-resource languages: training and evaluation.

  • Computational modeling language acquisition.

  • Corpora mining and resource extraction for low-resource languages.

Selected Projects:

Gulf Arabic: Modeling Challenges of a Low-Resource Variety

(Khalifa et al 2016)
    • A collection of +100M words of Gulf Arabic.

    • The corpus comprises of long conversational novels whose author names are noms de plume.

    • The corpus is available to browse and to obtain as 5-gram language models (a la Google Ngrams).

(Khalifa et al 2017)
    • A full morphological analyzer for Emirati Gulf Arabic verbs based on phonetic dictionary entries of Emirati verb lemmas.

    • A lexicon of +2,400 verb lemmas with their respective English gloss, real root, and templatic pattern.

(Khalifa et al 2018)
    • A commissioned full morphological annotation of 200,000 words of Emirati Gulf Arabic from eight different novels from Gumar.

    • A morphological lexicon and analyzer is automatically extracted from the data.

Full Morphological Neural Disambiguation System for Gulf Arabic

(Khalifa et al 2020)
    • A work in progress of benchmarking the first of its kind morphological disambiguator for Gulf Arabic as a low resource dialect of Arabic.

(Obeid et al 2020)

Developed various Arabic morphological modeling modules:

  • Morphological analysis, generation, and reinflection.

  • Morphological tokenization.

  • Morphological disambiguation.

To view a full range of what I've been working on, please refer to my publication page.