Extraction of Multi-Word Terms and Complex Terms from the Classical Arabic Text of the Quran

Sameer M M Alrehaili, Eric Atwell


The identification of domain-specific terms is a crucial step in many natural language processing applications. Term extraction is a process of obtaining a set of terms that represent the domain of a given text. The majority of term extraction research projects conducted for the Quran have used translated text instead of the original Classical Arabic text of the Quran. The extraction of terms from the original Arabic text rather than a translation may help in retrieving more relevant terms, due to the lack of Islamic equivalents of some Quran terms in other languages. This paper demonstrates a hybrid-based method for the acquisition of a list of domain-specific terms from the Arabic text of the Quran. The produced list of terms was validated using a common evaluation metric for ranked list; precision of up to 0.81 was achieved for the top 200 terms. We discuss the precision that was achieved, in the context of two existing datasets from previous research.

