InfoCapability

Groq becomes Inference Provider on Hugging Face Hub with Llama-4 and QWQ-32B support

AI Impact Summary

Groq is now an official Inference Provider on Hugging Face Hub, enabling serverless inference for Groq-hosted models directly from Hugging Face model pages and via the JS/Python SDKs. The integration supports models like Meta's Llama-4 and Qwen's QWQ-32B, leveraging Groq's LPU architecture to enable low-latency inference for real-time applications. There are two call modes: Custom key (direct to the provider) and Routed by HF (charges billed to the HF account), which has direct cost implications and impacts credential management; teams should plan whether to use direct Groq keys or HF routing and monitor provider vs HF billing, and prepare for SDK update to v0.33.0 when released.

Affected Systems

Hugging Face Hub Inference ProvidersGroq Inference API

Date: Date not specified
Change type: capability
Severity: info

Groq becomes Inference Provider on Hugging Face Hub with Llama-4 and QWQ-32B support

More from Hugging Face

Get alerts for Hugging Face