Groq becomes Inference Provider on Hugging Face Hub with Llama-4 and QWQ-32B support
AI Impact Summary
Groq is now an official Inference Provider on Hugging Face Hub, enabling serverless inference for Groq-hosted models directly from Hugging Face model pages and via the JS/Python SDKs. The integration supports models like Meta's Llama-4 and Qwen's QWQ-32B, leveraging Groq's LPU architecture to enable low-latency inference for real-time applications. There are two call modes: Custom key (direct to the provider) and Routed by HF (charges billed to the HF account), which has direct cost implications and impacts credential management; teams should plan whether to use direct Groq keys or HF routing and monitor provider vs HF billing, and prepare for SDK update to v0.33.0 when released.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info