Extraction of Multi-Word Terms and Complex Terms from the Classical Arabic Text of the Quran

Sameer M M Alrehaili, Eric Atwell


The identification of domain-specific terms is a crucial step in many natural language processing applications. Term extraction is a process of obtaining a set of terms that represent the domain of a given text. The majority of term extraction research projects conducted for the Quran have used translated text instead of the original Classical Arabic text of the Quran. The extraction of terms from the original Arabic text rather than a translation may help in retrieving more relevant terms, due to the lack of Islamic equivalents of some Quran terms in other languages. This paper demonstrates a hybrid-based method for the acquisition of a list of domain-specific terms from the Arabic text of the Quran. The produced list of terms was validated using a common evaluation metric for ranked list; precision of up to 0.81 was achieved for the top 200 terms. We discuss the precision that was achieved, in the context of two existing datasets from previous research.

Full Text:



Abbas, N.H., 2009. Quran’search for a Concept’Tool and Website. Unpublished MSc Dissertation, School of Computing, University of Leeds. Available at: http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Quran+?Search+for+a+Concept?+Tool+and+Website#0 [Accessed November 26, 2013].

Abbas, N.H. & Atwell, E., 2013. Annotating the Arabic Quran with semantic web content tags. In E. Atwell & A. Hardie, eds. Proceedings of WACL-2 Second Workshop on Arabic Corpus Linguistics. Lancaster, UK, pp. 54–55.

Al-Yahya, M. & Al-Khalifa, H., 2010. An Ontological Model for Representing Semantic Lexicons: An Application on Time Nouns in the Holy Quran. The Arabian Journal for Science and Engineering, 35(2), pp.21–35. Available at: http://www.researchgate.net/publication/228955782_An_Ontological_Model_for_Representing_Semantic_Lexicons_An_Application_on_Time_Nouns_in_the_Holy_Quran/file/50463516eca79add3d.pdf [Accessed November 26, 2013].

Alhawarat, M., 2015. Extracting Topics from the Holy Quran Using Generative Models. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 6(12), pp.288–294.

Ali, Abobaker and Brakhw, M Alsaleh and Nordin, Munif Zarirruddin Fikri Bin and ShaikIsmail, S.F., 2012. Some Linguistic Difficulties in Translating the Holy Quran from Arabic into English. International Journal of Social Science and Humanity, 2(6), pp.588–590.

Alrehaili, S.M. & Atwell, E., 2014. Computational ontologies for semantic tagging of the Quran : A survey of past approaches . In Proceedings of the 2nd Workshop on Language Resources and Evaluation for Religious Texts. Reykjavik, Iceland, pp. 19–23. Available at: http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-LRE-Rel2 Proceedings.pdf.

Boulaknadel, S., Daille, B. & Aboutajdine, D., 2008. A multi-word term extraction program for Arabic language. In Language Resources and Evaluation Conference (LREC). pp. 3–6. Available at: http://pages.cs.brandeis.edu/~marc/misc/proceedings/lrec-2008/pdf/378_paper.pdf.

Cimiano, P., 2006. Ontology learning from text, Springer US. Available at: http://medcontent.metapress.com/index/A65RM03P4874243N.pdf [Accessed April 15, 2014].

Dukes, K., 2013. Statistical Parsing by Machine Learning from a Classical Arabic Treebank. PhD Thesis, School of Computing, University of Leeds. Available at: http://www.kaisdukes.com/papers/thesis-dukes2013.pdf%5CnAll Papers/D/Dukes 2013 - Statistical Parsing by Machine Learning from a Classical Arabic Treebank.pdf.

Dukes, K. & Atwell, E., 2012. LAMP : A Multimodal Web Platform for Collaborative Linguistic Analysis. Lrec 2012, pp.3268–3275.

El-Beltagy, S.R. & Rafea, A., 2010. KP-Miner: Participation in SemEval-2. In Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, pp. 190–193. Available at: http://www.aclweb.org/anthology/S10-1041 [Accessed March 23, 2017].

El-Beltagy, S.R., Rafea, A. & Melamed, I.D., 2009. KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems, 34(1), pp.132–144.

Frantzi, K.T. & Ananiadou, S., 1999. The C-value/NC-value domain-independent method for multi-word term extraction. Journal of Natural Language Processing, 6(3), pp.145–179.

Hardeniya, N., 2015. NLTK Essentials Build cool NLP and machine learning applications using NLTK and other Python libraries, Packt Publishing Ltd.

Harrag, F. et al., 2014. Using association rules for ontology extraction from a Quran corpus. In Proc. 5th Int. Conf. Arabic Language Process. pp. 1–8.

Kang, Yong-Bin and Haghighi, Pari Delir and Burstein, F., 2014. CFinder: An intelligent key concept finder from text for ontology development. Expert Systems with Applications, 41(9), pp.4494–4504.

Kashgary, A.D., 2011. The paradox of translating the untranslatable: Equivalence vs. non-equivalence in translating from Arabic into English. Journal of King Saud University - Languages and Translation, 23(1), pp.47–57. Available at: www.ksu.edu.sa.

Mukhtar, T., Afzal, H. & Majeed, A., 2012. Vocabulary of Quranic concepts: A semi-automatically created terminology of Holy Quran. In 2012 15th International Multitopic Conference, INMIC 2012. IEEE, pp. 43–46.

Nakagawa, H. & Mori, T., 2002. A Simple but Powerful Automatic Term Extraction Method. In COLING-02 on COMPUTERM 2002: second international workshop on computational terminology. COMPUTERM ’02. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 1–7. Available at: http://dx.doi.org/10.3115/1118771.1118778.

Norman, C., 2015. Technical Term Extraction Using Measures of Neology. MSc Dissertation, Department of Computer Science, KTH Royal Institute of Technology,Stockholm, Sweden.

Perkins, J., 2010. Python text processing with NLTK 2.0 cookbook, Birmingham, UK: Packt Publishing Ltd.

Ryu, P.-M. & Choi, K.-S., 2005. An Information-Theoretic Approach to Taxonomy Extraction for Ontology Learning. In Ontology Learning from Text: Methods, Evaluation and Applications. p. 15. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi= [Accessed March 22, 2017].

Saad, S. & Salim, N., 2008. Methodology of Ontology Extraction for Islamic Knowledge Text. In Postgraduate Annual Research Seminar.

Sharaf, A. & Atwell, E., 2012. QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In LREC Language Resources and Evaluation Conference. Istanbul, Turkey, pp. 130–137. Available at: http://www.researchgate.net/publication/228522230_QurAna_Corpus_of_the_Quran_annotated_with_Pronominal_Anaphora/file/60b7d518ab73049436.pdf [Accessed May 15, 2014].

Ullah Khan, H. et al., 2013. Ontology Based Semantic Search in Holy Quran. International Journal of Future Computer and Communication, 2(6), pp.570–575. Available at: http://www.ijfcc.org/index.php?m=content&c=index&a=show&catid=43&id=493 [Accessed December 16, 2013].

Zarrabi, H.-Z., 2007. Tanzil Project. Available at: http://tanzil.net/wiki/.


  • There are currently no refbacks.