Apriel-H1: Distilling efficient reasoning with Mamba hybrids for 15B models
AI Impact Summary
Researchers converted a 15B reasoning model to a Mamba hybrid, achieving ~2.1x throughput with minimal quality loss by distilling reasoning-specific signals rather than general pretraining data. The key is using high-quality reasoning traces from the teacher’s SFT dataset and reverse KL divergence to bias the student toward confident, correct steps, plus staged distillation (LOO-based layer pruning and MIL-Mamba-Replacement). Results show 2.1x-3.4x throughput across H-30-SFT to H-40, with MMLU/GPQA/MTBench/GSM8k metrics illustrating targeted gains and some trade-offs in certain tasks. For teams, this provides a concrete path to scaling 15B reasoning models without full retraining, leveraging Fast-LLM and Mamba while requiring rigorous data curation and a staged replacement plan.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info