IBM releases Bamba-9B: Hybrid Mamba2 Model for Efficient Inference
AI Impact Summary
IBM has released Bamba-9B, an inference-efficient Hybrid Mamba2 model trained on 2.2T tokens, demonstrating 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM. This model is immediately available for experimentation via Hugging Face, vLLM, TRL, and llama.cpp, and the team is actively working to address gaps in math benchmarks and MMLU scores by extending pretraining and incorporating high-quality math data.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info