InfoCapability

Open-R1 Update #1: Reproducing DeepSeek-R1 training pipeline and synthetic data

AI Impact Summary

Open-R1 is a community-driven effort to replicate DeepSeek-R1's training pipeline and synthetic data. The update confirms reproduced MATH-500 scores for several DeepSeek-R1 distillations and describes a training pipeline built around TRL GRPO, DeepSpeed ZeRO, and vLLM to enable scalable, parallel generation. It details hardware scaling experiments for synthetic data, moving from 2x8xH100 to 4x8xH100 (32 GPUs), and shifts from batched to streaming inference to stabilize GPU utilization, with average outputs around 6k tokens and some over 20k. Reproducing DeepSeek-R1 at scale thus demands substantial memory and orchestration, informing resource planning, cost, and timeline considerations for teams benchmarking or improving this lineage.

Affected Systems

DeepSeek-R1Open-R1

Date: Date not specified
Change type: capability
Severity: info

Open-R1 Update #1: Reproducing DeepSeek-R1 training pipeline and synthetic data

More from Hugging Face

Get alerts for Hugging Face