Open R1: OpenR1-Math-220k dataset released — 220k math reasoning traces
AI Impact Summary
Open R1 is rapidly developing a large-scale math reasoning dataset, OpenR1-Math-220k, leveraging a distributed generation pipeline on H100 GPUs. The team is employing techniques like generating two answers per problem, filtering with Math Verify, and utilizing an LLM judge to recover solutions, demonstrating a sophisticated approach to data curation. This project represents a significant investment in synthetic data for training reasoning models, particularly targeting performance on benchmarks like AIME 2024.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info