InfoCapability

Apriel-H1: Distilling efficient reasoning with Mamba hybrids for 15B models

AI Impact Summary

Researchers converted a 15B reasoning model to a Mamba hybrid, achieving ~2.1x throughput with minimal quality loss by distilling reasoning-specific signals rather than general pretraining data. The key is using high-quality reasoning traces from the teacher’s SFT dataset and reverse KL divergence to bias the student toward confident, correct steps, plus staged distillation (LOO-based layer pruning and MIL-Mamba-Replacement). Results show 2.1x-3.4x throughput across H-30-SFT to H-40, with MMLU/GPQA/MTBench/GSM8k metrics illustrating targeted gains and some trade-offs in certain tasks. For teams, this provides a concrete path to scaling 15B reasoning models without full retraining, leveraging Fast-LLM and Mamba while requiring rigorous data curation and a staged replacement plan.

Affected Systems

Apriel-H1-15b-Thinker-SFTMamba

Date: Date not specified
Change type: capability
Severity: info

Apriel-H1: Distilling efficient reasoning with Mamba hybrids for 15B models

More from Hugging Face

Get alerts for Hugging Face