InfoCapability

AraGen Benchmark and Leaderboard: 3C3H-based Arabic LLM Evaluation Framework

AI Impact Summary

AraGen introduces a dynamic benchmark and leaderboard for Arabic LLMs built on the 3C3H evaluation framework, combining factual accuracy and usability via an LLM-as-a-Judge. The evaluation pipeline runs three-month blind testing cycles with private datasets and code that are released after the cycle, which helps prevent data leakage and keeps benchmarks current. The framework covers six dimensions (Correctness, Completeness, Conciseness, Helpfulness, Honesty, Harmlessness) and pairs it with a Task Leaderboard (QA, reasoning, orthographic analysis, safety), pushing teams to optimize both knowledge and user-aligned behavior for Arabic models.

Affected Systems

AraGen BenchmarkAraGen Leaderboard

Date: Date not specified
Change type: capability
Severity: info

AraGen Benchmark and Leaderboard: 3C3H-based Arabic LLM Evaluation Framework

More from Hugging Face

Get alerts for Hugging Face