Alyah Benchmark: Evaluating Emirati Dialect Capabilities in Arabic LLMs
AI Impact Summary
The Alyah benchmark focuses on evaluating Arabic Large Language Models’ understanding of Emirati dialect, a critical gap in existing benchmarks that primarily assess Modern Standard Arabic. This benchmark, containing 1,173 manually curated samples, probes deeper than surface-level lexical knowledge, targeting cultural and pragmatic nuances. Initial evaluation results show that instruction-tuned models, particularly those like falcon-h1-arabic-7b-instruct, significantly outperform base models in understanding the Emirati dialect, highlighting the importance of alignment and instruction tuning for this specialized domain.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info