InfoCapability

Hugging Face Inference Endpoints supports AWS Inferentia2 Inf2 on SageMaker

AI Impact Summary

AWS Inferentia2 is now broadly accessible via SageMaker and Hugging Face Inference Endpoints, enabling production-scale deployment of models on Inf2 accelerators. The integration leverages Optimum-Neuron to train and deploy transformers and LLMs (including Llama 3 variants) across 14 architectures and 6 tasks, with support for Neuron-based Text Generation Inference (TGI) and OpenAI SDK Messages API compatibility. This expands cost-effective, scalable inference options for Hugging Face models, with autoscaling and scale-to-zero capabilities, and introduces Inf2 flavor choices (Inf2-small and Inf2-xlarge) to match workload size.

Affected Systems

AWS Inferentia2Inf2 instances

Date: Date not specified
Change type: capability
Severity: info

Hugging Face Inference Endpoints supports AWS Inferentia2 Inf2 on SageMaker

More from Hugging Face

Get alerts for Hugging Face