InfoCapability

Arcee AI migrates SLM workloads from AWS EKS to Together Dedicated Endpoints

AI Impact Summary

Arcee AI transitioned seven specialized small language models from AWS EKS to Together Dedicated Endpoints, leveraging serverless inference and a private Hugging Face repo to simplify deployment. The change removes in-house Kubernetes and GPU-ops burden, enabling faster scaling and more cost-efficient runtimes. Early results show latency improvements up to 95% (485ms down to 29ms) and throughput of 41+ QPS at 32 concurrent requests, enhancing responsiveness across coding, general text, and tool-calling tasks. This migration also tightens integration with Conductor and Orchestra, enabling more flexible routing and agentic workflows while preserving the option to fall back to third-party models like GPT-4.1, Claude 3.7 Sonnet, or DeepSeek-R1 when necessary.

Affected Systems

Together Dedicated EndpointsAWS EKS

Date: Date not specified
Change type: capability
Severity: info

Arcee AI migrates SLM workloads from AWS EKS to Together Dedicated Endpoints

More from Together AI

Get alerts for Together AI