Open R1: Update #3 — OlympicCoder models benchmarked on IOI and CodeForces
AI Impact Summary
Open R1: Update #3 introduces a significant dataset and benchmark for evaluating code reasoning models, specifically focusing on competitive programming challenges. The release of CodeForces-CoTs, OlympicCoder models, and the IOI benchmark provides a new, verifiable dataset for assessing model performance, particularly in C++ and Python. The findings highlight the limitations of existing datasets like DeepMind’s contests, which rely on a small subset of test cases, and demonstrate the potential of models trained on the new, comprehensive R1 dataset.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info