Together Batch API launches with 50% cost savings for asynchronous LLM processing
AI Impact Summary
Together is launching a Batch API that enables asynchronous processing of LLM requests at half the price of real-time inference, targeting non-urgent workloads such as evaluations, data transformations, and offline summarization. Batches can include up to 50,000 requests in a 100MB JSONL file and are processed within hours on a best-effort 24-hour window, with dedicated batch rate limits separate from real-time usage. To implement, developers upload input via the Files API, create and monitor batch jobs, and retrieve results, while selecting from the listed supported models across the DeepSeek, Meta-Llama, Mistral, and Qwen families.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info