InfoCapability

Arabic-Leaderboards update: AraGen-03-25, Arabic Instruction Following, and MBZUAI-backed Arabic-Leaderboards Space

AI Impact Summary

MBZUAI-backed Arabic-Leaderboards Space now consolidates Arabic evaluations, hosting AraGen-03-25 and Arabic Instruction Following, with plans to add more modalities. The AraGen-03-25 release expands the dataset to 340 QA/Reasoning/Safety/Orthographic pairs and refines the judge system prompt; dynamic evaluation and ranking analyses show overall stability of top models but notable shifts among Claude variants and gpt-4o when prompts and datasets change. This expanded, more challenging benchmark surface will influence how teams benchmark Arabic models and compare deployment candidates, and the space invites external contributions to add new leaderboards.

Affected Systems

Arabic-Leaderboards SpaceAraGen Leaderboard

Date: Date not specified
Change type: capability
Severity: info

Arabic-Leaderboards update: AraGen-03-25, Arabic Instruction Following, and MBZUAI-backed Arabic-Leaderboards Space

More from Hugging Face

Get alerts for Hugging Face