Qatar’s ACRPS Unveils DALLA Framework To Slash Arabic AI Training Costs

4 Min Read

The Doha-based Arab Center for Research and Policy Studies (ACRPS) has officially released the DALLA framework, a pioneering open-source pipeline designed to democratize the development of Arabic large language models (LLMs). Developed by the center’s Unit for Research in Arabic Digital and Social Spheres, DALLA introduces innovative techniques to significantly reduce the computational barriers to entry for Arabic AI. By cutting training and inference costs by 50 to 75 percent, the initiative empowers smaller research teams and organizations to build socially and culturally aware AI models without requiring the massive budgets typically associated with frontier technology.

Revolutionizing Efficiency With Token Reuse

At the core of the DALLA framework is a novel token reuse technique that addresses one of the primary challenges in non-English LLM development: efficiency. The framework achieves a 2X to 4X reduction in token count compared to original models, delivering proportional cost savings in both training and operational phases. This method allows developers to improve Arabic language coverage significantly without increasing the model’s vocabulary size, maintaining high performance while strictly managing resource expenditure. This breakthrough creates a viable pathway for startups, universities, and independent researchers to participate in the AI ecosystem.

New Open-Source Models For The Community

To demonstrate the pipeline’s capabilities, ACRPS has released two new open-weight models via the AI community platform Hugging Face under a Creative Commons license. The first, dalla-gemma-it 9B, is adapted from Google’s Gemma 2 using a sentencepiece token reuse method. The second, dalla-llama-it 8B, is based on Meta’s Llama 3.1 and utilizes the R-BPE framework. Both models have been further trained on curated, culturally grounded Arabic data, ensuring they offer more fluent generation and better value alignment with Arab communities than standard off-the-shelf global models.

Prioritizing Data Sovereignty and Culture

Beyond technical efficiency, DALLA places a strong emphasis on data privacy and cultural relevance. The framework is designed to address the “value gaps” often found in Western-centric generative AI models when applied to social sciences and humanities in the Arab world. By enabling organizations to train and operate models in-house, DALLA protects intellectual property and sensitive training data from exposure to large global technology firms. This approach supports data sovereignty and allows for the creation of AI that accurately reflects the social, cultural, and political nuances of the MENA region.

Complementing The Regional Ecosystem

ACRPS positions DALLA as a complementary force to existing sovereign Arabic models like Allam, Jais, and Fanar. Having previously contributed to the development of Qatar’s Fanar v1, the ACRPS team views this new framework as a tool to foster a broader community of responsible AI research. The open-source nature of the pipeline—covering data processing, embeddings, and fine-tuning—encourages collaboration across the research, media, and education sectors, laying the groundwork for a self-sustaining Arabic AI development ecosystem.

About Arab Center for Research and Policy Studies

Established in 2010, the Arab Center for Research and Policy Studies (ACRPS) is an independent research institution based in Doha, Qatar, with a global presence including offices in Beirut, Tunis, Washington DC, Paris, and Madrid. The Center is dedicated to fostering systematic research in the social sciences and humanities, publishing hundreds of academic titles and peer-reviewed journals. Its Unit for Research in Arabic Social and Digital Spaces focuses on building digital infrastructure that bridges computational linguistics with academic rigor to address cultural gaps in modern technology.

Source: Middle East AI News

Share This Article