InfoCapability

Hugging Face adds serverless Inference Providers on Hub: fal-ai, Replicate, Sambanova, Together AI

AI Impact Summary

Hugging Face is expanding its Inference API by embedding four serverless providers (fal-ai, Replicate, Sambanova, Together AI) directly on model pages and in the JS/Python SDKs, enabling end-to-end inference without managing dedicated deployments. There are two billing/auth models: custom keys that call the provider directly and routed requests via HF that bill HF accounts, which will impact how developers manage credentials and who pays for usage. This increases the available inference paths (e.g., DeepSeek-R1, Llama-3.3-70B-Instruct) and may influence cost, latency, and migration decisions for existing workloads.

Affected Systems

fal-aiReplicate

Date: Date not specified
Change type: capability
Severity: info

Hugging Face adds serverless Inference Providers on Hub: fal-ai, Replicate, Sambanova, Together AI

More from Hugging Face

Get alerts for Hugging Face