InfoCapability

Introducing LiveCodeBench Leaderboard - Contamination-Free Code LLM Evaluation

AI Impact Summary

The LiveCodeBench leaderboard introduces a new benchmark for evaluating code LLMs, addressing the critical issue of contamination found in existing benchmarks like HumanEval. This new approach uses a continuously updated dataset of coding problems from LeetCode, AtCoder, and CodeForces, annotated with release dates to prevent overfitting and provide a more realistic assessment of model capabilities. The leaderboard incorporates holistic evaluation across code generation, self-repair, test output prediction, and code execution, offering a more comprehensive understanding of LLM performance.

Affected Systems

LiveCodeBenchLeetCode

Date: Date not specified
Change type: capability
Severity: info

Introducing LiveCodeBench Leaderboard - Contamination-Free Code LLM Evaluation

More from Hugging Face

Get alerts for Hugging Face