Together Deployments: Custom Metric Autoscaling
AI Impact Summary
Together Deployments now allows users to scale their applications based on custom Prometheus metrics exposed by worker endpoints. This expands scaling capabilities beyond the standard Together AI metrics, enabling teams to react to application-specific signals like vllm:num_requests_running, providing greater control over resource allocation and application performance. This change introduces a new configuration option and requires monitoring of the selected metric for effective scaling.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info