InfoCapability

IBM releases Bamba-9B: Hybrid Mamba2 Model for Efficient Inference

AI Impact Summary

IBM has released Bamba-9B, an inference-efficient Hybrid Mamba2 model trained on 2.2T tokens, demonstrating 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM. This model is immediately available for experimentation via Hugging Face, vLLM, TRL, and llama.cpp, and the team is actively working to address gaps in math benchmarks and MMLU scores by extending pretraining and incorporating high-quality math data.

Affected Systems

GPT-3.5 TurboOpenAI API

Date: Date not specified
Change type: capability
Severity: info

IBM releases Bamba-9B: Hybrid Mamba2 Model for Efficient Inference

More from Hugging Face

Get alerts for Hugging Face