InfoCapability

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

AI Impact Summary

Google Cloud C4 VMs, utilizing Intel Xeon 6 processors (Granite Rapids), demonstrate a significant Total Cost of Ownership (TCO) reduction of 70% compared to previous-generation C3 VMs when running OpenAI’s GPT OSS Large Language Model. This improvement stems from optimized expert execution and efficient MoE model architecture, resulting in a 1.7x increase in throughput per vCPU and a substantial reduction in cost per token generated. This change impacts developers and teams relying on LLMs for text generation, offering a more cost-effective solution.

Affected Systems

GPT-3.5 TurboOpenAI API

Date: Date not specified
Change type: capability
Severity: info

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

More from Hugging Face

Get alerts for Hugging Face