Improved Batch Inference API: 30B Token Rate Limit & Cost Reduction
Action Required
Organizations using the Batch Inference API will benefit from increased throughput and reduced costs for high-volume data processing workloads.
AI Impact Summary
The Batch Inference API has undergone significant enhancements, including a substantial rate limit increase (30B tokens), expanded model support across serverless and private deployments, and a cost reduction of 50% compared to the real-time API. These changes represent a major shift in the API’s capabilities and economics, impacting workflows reliant on high-throughput batch processing. The move to a fully prepaid billing model also introduces a new operational consideration for users.
Affected Systems
- Date
- Date not specified
- Change type
- pricing
- Severity
- medium