InfoCapability

The Hallucinations Leaderboard — open benchmark for LLM factuality and faithfulness

AI Impact Summary

The Hallucinations Leaderboard provides an open, ongoing benchmark suite to quantify LLM factuality and faithfulness across a broad set of tasks and datasets. It leverages the EleutherAI Language Model Evaluation Harness and is a fork of the Hugging Face Leaderboard Template, with experiments run on HPC resources at the University of Edinburgh using NVIDIA A100 GPUs. For a technical org, adopting these benchmarks creates a reproducible evaluation path to compare internal and external models on hallucination risk prior to production deployment, enabling more informed vendor selection and feature design to reduce misinformation risk.

Affected Systems

EleutherAI Language Model Evaluation HarnessHugging Face Leaderboard Template

Date: Date not specified
Change type: capability
Severity: info

The Hallucinations Leaderboard — open benchmark for LLM factuality and faithfulness

More from Hugging Face

Get alerts for Hugging Face