InfoCapability

Apriel-H1-15b-Thinker-SFT achieves 2.1x throughput with Mamba-based distillation in Fast-LLM

AI Impact Summary

Apriel-H1 demonstrates retrofit distillation of a 15B reasoning model into a Mamba-based hybrid, delivering 2.1x throughput with minimal quality loss by training on teacher reasoning traces rather than generic pretraining data. It uses a staged distillation workflow (LOO-based layer replacements, MIL/MMR, and end-to-end SFT) and a reverse KL objective to preserve structured reasoning during the switch from attention to linear mixing. This provides a practical path to scale inference for reasoning-heavy workloads without rebuilding from scratch, contingent on access to high-quality SFT data and a capable training framework like Fast-LLM.

Affected Systems

Apriel-H1-15b-Thinker-SFTApriel-H1 family

Date: Date not specified
Change type: capability
Severity: info

Apriel-H1-15b-Thinker-SFT achieves 2.1x throughput with Mamba-based distillation in Fast-LLM

More from Hugging Face

Get alerts for Hugging Face