HighCapability

Together AI boosts DeepSeek-R1 speed with custom speculative decoding

Action Required

Customers can significantly reduce the cost and latency of DeepSeek-R1 inference, enabling faster and more efficient AI-powered applications.

AI Impact Summary

Together AI is releasing a new capability: customized speculative decoding for DeepSeek-R1, allowing customers to achieve significant speedups (1.23-1.45x) and cost reductions (25-55%) by training a custom speculator on their own inference traffic. This is particularly valuable for latency-sensitive applications like social media engagement and résumé screening, where faster response times and lower GPU costs are critical. Customers using dedicated endpoints will benefit most from this optimization.

Affected Systems

DeepSeek-R1

Date: Date not specified
Change type: capability
Severity: high

Together AI boosts DeepSeek-R1 speed with custom speculative decoding

More from Together AI

Get alerts for Together AI