The Hallucinations Leaderboard — open benchmark for LLM factuality and faithfulness
AI Impact Summary
The Hallucinations Leaderboard provides an open, ongoing benchmark suite to quantify LLM factuality and faithfulness across a broad set of tasks and datasets. It leverages the EleutherAI Language Model Evaluation Harness and is a fork of the Hugging Face Leaderboard Template, with experiments run on HPC resources at the University of Edinburgh using NVIDIA A100 GPUs. For a technical org, adopting these benchmarks creates a reproducible evaluation path to compare internal and external models on hallucination risk prior to production deployment, enabling more informed vendor selection and feature design to reduce misinformation risk.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info