About BAREC

The overarching objective of the BAREC project is to develop a comprehensive reference resource to facilitate the study and evaluation of Arabic readability across the Arab world. This proposal is aligned with the recommendations set forth in the Arabic language curriculum research that the Abu Dhabi Arabic Language Center is currently conducting. BAREC adopts an evidence-based approach and generate practical resources and tools to support and enhance the use of the Arabic language. To this end, we aim to compile a corpus of 10 million words that encompasses diverse genres, topics, and countries of origin, with a particular focus on readability levels. Portions of this corpus underwent manual annotation to mark vocabulary and syntax complexity. Furthermore, we build a comprehensive lexicon annotated for readability levels. These annotations serve as the basis for developing artificial intelligence (AI) tools to automatically annotate the remaining corpus. We also design additional AI tools to assist content creators in assessing the readability levels of their materials based on specific target audiences. The project start date: September, 2023
  • Prof. Nizar Habash (Principal Investigator): NYUAD Professor of Computer Science and Director of the Computational Approaches to Modeling Language (CAMeL) Lab
  • Prof. Hanada Taha (Co-Principal Investigator): Director of the ZAI Centre at Zayed University
  • Mr. Khalid Nabig: Primary Engineer
  • Ms. Kinda Altarbouch: Supporting Engineer
  • Ms. Nour Rabih: MS Student
  • Annotation Team: Reem Faraj, Zeina Zeino, Mirvat Dawi, Rita Raad, Sawsan Tannir, Adel Wizani, Samar Zeino

1M

Word sentence-level readability annotated

19

Readability Levels

40K

Lemmas