OpenR1 Update #2: OpenR1-Math-220k large-scale math reasoning dataset
AI Impact Summary
OpenR1 Update #2 details the construction of OpenR1-Math-220k, a large-scale math-reasoning dataset built from 400k problems with two model answers each, generated on 512 H100 GPUs using a local vLLM/SGLang pipeline. The workflow combines automated filtering with Math Verify and a Llama-3.3-70B-Instruct judge to improve answer correctness, and reports that 800k reasoning traces are produced across 220k validated problems, with multiple solutions per problem. This demonstrates that distillation from DeepSeek R1 to Qwen/Llama-family models can reach competitive math reasoning without RL, and it provides a scalable data generation path (including rejection sampling and DPO-friendly formats) that could accelerate fine-tuning of smaller models. For engineering teams, this means a ready-made, high-quality dataset and an end-to-end local-generation pipeline to boost reasoning capabilities without API data limits, at significant GPU and storage cost but with potential for faster iteration.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info