InfoCapability

Hugging Face Inference for PROs adds exclusive endpoints and higher rate limits

AI Impact Summary

Hugging Face is expanding PRO capabilities by granting PRO subscribers exclusive HTTP endpoints for a curated suite of models and higher rate limits on the Inference API, accelerating experimentation. The roster includes Meta Llama 3 Instruct, Mixtral, Nous Hermes 2 Mixtral, Zephyr, Llama 2 Chat, Mistral 7B, Code Llama, Stable Diffusion XL 3B UNet, and Bark, among others, with ultra-fast inference powered by text-generation-inference. This enables faster prototyping for teams using PRO, but it is explicitly not intended for heavy production; for production workloads, use Inference Endpoints. Access is token-based and endpoints support common generation parameters via HTTP or InferenceClient integration.

Affected Systems

Meta Llama 3 InstructMixtral

Date: Date not specified
Change type: capability
Severity: info

Hugging Face Inference for PROs adds exclusive endpoints and higher rate limits

More from Hugging Face

Get alerts for Hugging Face