Hugging Face adds serverless Inference Providers to Hub model pages: fal-ai, Replicate, Sambanova, Together AI
AI Impact Summary
Hugging Face is introducing a unified serverless inference layer by adding fal-ai, Replicate, SambaNova, and Together AI as integrated providers on model pages and in the JS/Python SDKs. This enables developers to run inferences from multiple providers through a single route—either with provider keys or via routing through Hugging Face—without rewriting clients. The routing proxy at router.huggingface.co/{provider} and the provided code samples illustrate cross-provider usage, billing flows, and key management, including HF credits for PRO users and a free quota for signed-in users, which will influence usage patterns and cost planning.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info