MediumPricing

Improved Batch Inference API: 30B Token Rate Limit & Cost Reduction

Action Required

Organizations using the Batch Inference API will benefit from increased throughput and reduced costs for high-volume data processing workloads.

AI Impact Summary

The Batch Inference API has undergone significant enhancements, including a substantial rate limit increase (30B tokens), expanded model support across serverless and private deployments, and a cost reduction of 50% compared to the real-time API. These changes represent a major shift in the API’s capabilities and economics, impacting workflows reliant on high-throughput batch processing. The move to a fully prepaid billing model also introduces a new operational consideration for users.

Affected Systems

Batch Inference API

Date: Date not specified
Change type: pricing
Severity: medium

Improved Batch Inference API: 30B Token Rate Limit & Cost Reduction

More from Together AI

Get alerts for Together AI