InfoCapability

Together Batch API launches with 50% cost savings for asynchronous LLM processing

AI Impact Summary

Together is launching a Batch API that enables asynchronous processing of LLM requests at half the price of real-time inference, targeting non-urgent workloads such as evaluations, data transformations, and offline summarization. Batches can include up to 50,000 requests in a 100MB JSONL file and are processed within hours on a best-effort 24-hour window, with dedicated batch rate limits separate from real-time usage. To implement, developers upload input via the Files API, create and monitor batch jobs, and retrieve results, while selecting from the listed supported models across the DeepSeek, Meta-Llama, Mistral, and Qwen families.

Affected Systems

Together Batch APITogether Files API

Date: Date not specified
Change type: capability
Severity: info

Together Batch API launches with 50% cost savings for asynchronous LLM processing

More from Together AI

Get alerts for Together AI