Research

My current area of research is the intersection between computational morphology and computational modeling of language acquisition and the transfer of such insights into low-resource NLP.

Related research topics of my interest:

Computational modeling of morphology cross-dialectally.
The impact of linguistically driven design decisions on computational language modeling.
The behavior of modern ML techniques on tasks involving low-resource languages: training and evaluation.
Computational modeling language acquisition.
Corpora mining and resource extraction for low-resource languages.

Selected Projects:

Gulf Arabic: Modeling Challenges of a Low-Resource Variety

The Gumar Corpus

(Khalifa et al 2016)

- A collection of +100M words of Gulf Arabic.
- The corpus comprises of long conversational novels whose author names are noms de plume.
- The corpus is available to browse and to obtain as 5-gram language models (a la Google Ngrams).

Emirati Gulf Arabic Verb Morphological Analyzer and Lexicon

(Khalifa et al 2017)

- A full morphological analyzer for Emirati Gulf Arabic verbs based on phonetic dictionary entries of Emirati verb lemmas.
- A lexicon of +2,400 verb lemmas with their respective English gloss, real root, and templatic pattern.

Full Morphological Annotation of Emirati Gulf Arabic

(Khalifa et al 2018)

- A commissioned full morphological annotation of 200,000 words of Emirati Gulf Arabic from eight different novels from Gumar.
- A morphological lexicon and analyzer is automatically extracted from the data.

Full Morphological Neural Disambiguation System for Gulf Arabic

(Khalifa et al 2020)

- A work in progress of benchmarking the first of its kind morphological disambiguator for Gulf Arabic as a low resource dialect of Arabic.

CAMeL Tools: An Open Source Python Toolkit for Arabic NLP

(Obeid et al 2020)

Developed various Arabic morphological modeling modules:

Morphological analysis, generation, and reinflection.
Morphological tokenization.
Morphological disambiguation.

To view a full range of what I've been working on, please refer to my publication page.

Page updated

Google Sites

Report abuse