NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
AI Impact Summary
This competition, hosted by the Text-to-Intelligence Institute (TII), is focused on developing new benchmarks to evaluate early-stage training of Large Language Models (LLMs). The goal is to capture relevant signals during the initial training phases, particularly for scientific knowledge, which existing benchmarks fail to do. Participants will use the lm-evaluation-harness library and submit solutions via a Hugging Face Space, with a leaderboard and automated scoring based on signal quality, ranking consistency, and compliance with scientific knowledge.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium