Hugging Face Inference for PROs adds exclusive endpoints and higher rate limits
AI Impact Summary
Hugging Face is expanding PRO capabilities by granting PRO subscribers exclusive HTTP endpoints for a curated suite of models and higher rate limits on the Inference API, accelerating experimentation. The roster includes Meta Llama 3 Instruct, Mixtral, Nous Hermes 2 Mixtral, Zephyr, Llama 2 Chat, Mistral 7B, Code Llama, Stable Diffusion XL 3B UNet, and Bark, among others, with ultra-fast inference powered by text-generation-inference. This enables faster prototyping for teams using PRO, but it is explicitly not intended for heavy production; for production workloads, use Inference Endpoints. Access is token-based and endpoints support common generation parameters via HTTP or InferenceClient integration.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info