Hugging Face deprecates serverless GPU inference; launches Cloudflare Workers AI deployment
AI Impact Summary
Hugging Face is deprecating its existing serverless GPU inference integration and directing users to the Inference API, Inference Endpoints, or other deployment options. The new Deploy on Cloudflare Workers AI provides edge-deployed, pay-per-use serverless GPU inference, with models such as Llama 2 7B and Hermes 2 Pro on Mistral 7B available via REST or the Cloudflare AI SDK. Teams must evaluate migration to HF’s alternative deployments or Cloudflare Workers AI, ensuring model support, account/token readiness, and potential changes in latency and cost at the edge.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info