Hugging Face NVIDIA NIM API (serverless) launches for Enterprise Hub; legacy serverless path deprecated
AI Impact Summary
Hugging Face has replaced the deprecated NVIDIA NIM Serverless Inference path with a new NVIDIA NIM API (serverless) on the Hugging Face Hub, targeted at Enterprise Hub organizations. The new service delivers serverless inference for open models on NVIDIA DGX Cloud hardware with OpenAI-style API compatibility and a pay-as-you-go per-second pricing model, affecting deployment patterns for models like meta-llama/Meta-Llama-3-8B-Instruct and 3-70B-Instruct. Migration requires provisioning an Enterprise Hub organization, creating a fine-grained token, and using the provided SDK snippets to call chat completions and list models; cost will scale with model size, GPU count, and per-request duration.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info