InfoCapability

Hugging Face NVIDIA NIM API (serverless) launches for Enterprise Hub; legacy serverless path deprecated

AI Impact Summary

Hugging Face has replaced the deprecated NVIDIA NIM Serverless Inference path with a new NVIDIA NIM API (serverless) on the Hugging Face Hub, targeted at Enterprise Hub organizations. The new service delivers serverless inference for open models on NVIDIA DGX Cloud hardware with OpenAI-style API compatibility and a pay-as-you-go per-second pricing model, affecting deployment patterns for models like meta-llama/Meta-Llama-3-8B-Instruct and 3-70B-Instruct. Migration requires provisioning an Enterprise Hub organization, creating a fine-grained token, and using the provided SDK snippets to call chat completions and list models; cost will scale with model size, GPU count, and per-request duration.

Affected Systems

Hugging Face NVIDIA NIM API (serverless)Hugging Face Hub

Date: Date not specified
Change type: capability
Severity: info

Hugging Face NVIDIA NIM API (serverless) launches for Enterprise Hub; legacy serverless path deprecated

More from Hugging Face

Get alerts for Hugging Face