Together AI boosts DeepSeek-R1 speed with custom speculative decoding
Action Required
Customers can significantly reduce the cost and latency of DeepSeek-R1 inference, enabling faster and more efficient AI-powered applications.
AI Impact Summary
Together AI is releasing a new capability: customized speculative decoding for DeepSeek-R1, allowing customers to achieve significant speedups (1.23-1.45x) and cost reductions (25-55%) by training a custom speculator on their own inference traffic. This is particularly valuable for latency-sensitive applications like social media engagement and résumé screening, where faster response times and lower GPU costs are critical. Customers using dedicated endpoints will benefit most from this optimization.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high