Hugging Face Inference Endpoints supports AWS Inferentia2 Inf2 on SageMaker
AI Impact Summary
AWS Inferentia2 is now broadly accessible via SageMaker and Hugging Face Inference Endpoints, enabling production-scale deployment of models on Inf2 accelerators. The integration leverages Optimum-Neuron to train and deploy transformers and LLMs (including Llama 3 variants) across 14 architectures and 6 tasks, with support for Neuron-based Text Generation Inference (TGI) and OpenAI SDK Messages API compatibility. This expands cost-effective, scalable inference options for Hugging Face models, with autoscaling and scale-to-zero capabilities, and introduces Inf2 flavor choices (Inf2-small and Inf2-xlarge) to match workload size.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info