Hallucinations Leaderboard: Open LLM Evaluation Benchmark
AI Impact Summary
The Hallucinations Leaderboard project provides a crucial, open-source effort to quantify and track hallucination rates across a diverse set of large language models. By leveraging the EleutherAI Language Model Evaluation Harness and conducting experiments on high-performance computing infrastructure, the project establishes a standardized benchmark for evaluating LLM reliability. This leaderboard is critical for developers and researchers seeking to mitigate the risks associated with inaccurate or misleading information generated by these models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info