InfoCapability

Open R1: OpenR1-Math-220k dataset released — 220k math reasoning traces

AI Impact Summary

Open R1 is rapidly developing a large-scale math reasoning dataset, OpenR1-Math-220k, leveraging a distributed generation pipeline on H100 GPUs. The team is employing techniques like generating two answers per problem, filtering with Math Verify, and utilizing an LLM judge to recover solutions, demonstrating a sophisticated approach to data curation. This project represents a significant investment in synthetic data for training reasoning models, particularly targeting performance on benchmarks like AIME 2024.

Affected Systems

Open R1DeepSeek R1

Date: Date not specified
Change type: capability
Severity: info

Open R1: OpenR1-Math-220k dataset released — 220k math reasoning traces

More from Hugging Face

Get alerts for Hugging Face