InfoCapability

Hugging Face deprecates serverless GPU inference; launches Cloudflare Workers AI deployment

AI Impact Summary

Hugging Face is deprecating its existing serverless GPU inference integration and directing users to the Inference API, Inference Endpoints, or other deployment options. The new Deploy on Cloudflare Workers AI provides edge-deployed, pay-per-use serverless GPU inference, with models such as Llama 2 7B and Hermes 2 Pro on Mistral 7B available via REST or the Cloudflare AI SDK. Teams must evaluate migration to HF’s alternative deployments or Cloudflare Workers AI, ensuring model support, account/token readiness, and potential changes in latency and cost at the edge.

Affected Systems

Hugging Face HubHugging Face Inference API

Date: Date not specified
Change type: capability
Severity: info

Hugging Face deprecates serverless GPU inference; launches Cloudflare Workers AI deployment

More from Hugging Face

Get alerts for Hugging Face