Groq added as Inference Provider on Hugging Face Hub
AI Impact Summary
Groq is now an Inference Provider on the Hugging Face Hub, enabling Groq’s Inference API to serve models directly from model pages. This expands deployment options for real-time LLM workloads by leveraging Groq LPUs to achieve lower latency on supported models such as meta-llama/Llama-4-Scout-17B-16E-Instruct. Developers can integrate via the Python and JavaScript SDKs (huggingface_hub.InferenceClient and @huggingface/inference), with two billing paths: direct provider keys route through Groq, while routed requests are billed through Hugging Face. Official support is slated for HF client v0.33.0, so plan for a migration and testing window when the update lands.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info