Open-R1 Update #1: Reproducing DeepSeek-R1 training pipeline and synthetic data
AI Impact Summary
Open-R1 is a community-driven effort to replicate DeepSeek-R1's training pipeline and synthetic data. The update confirms reproduced MATH-500 scores for several DeepSeek-R1 distillations and describes a training pipeline built around TRL GRPO, DeepSpeed ZeRO, and vLLM to enable scalable, parallel generation. It details hardware scaling experiments for synthetic data, moving from 2x8xH100 to 4x8xH100 (32 GPUs), and shifts from batched to streaming inference to stabilize GPU utilization, with average outputs around 6k tokens and some over 20k. Reproducing DeepSeek-R1 at scale thus demands substantial memory and orchestration, informing resource planning, cost, and timeline considerations for teams benchmarking or improving this lineage.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info