InfoCapability

Google Cloud Compute Engine Emerald Rapids C4 CPUs outperform N2 for embedding and generation with favorable TCO

AI Impact Summary

Google Cloud Compute Engine C4 (Emerald Rapids) CPUs deliver substantial throughputs over N2 (Ice Lake) for agentic AI workloads, with embedding throughput 10x-24x and generation 2.3x-3.6x. When accounting for price (~1.3x hourly for C4), the C4 shows a 7x-19x TCO advantage for embedding and 1.7x-2.9x for generation, implying CPU-only hosting of lightweight agentic AI pipelines can be cost-effective at scale. The benchmark uses optimum-benchmark with optimum-intel on models WhereIsAI/UAE-Large-V1 for embedding and meta-llama/Llama-3.2-3 for generation, with NUMA binding and ipex optimizations, highlighting concrete deployment parameters and potential gains from newer Granite Rapids generations for Llama-3 workloads.

Affected Systems

Google Cloud Compute Engine N2Google Cloud Compute Engine C4

Date: Date not specified
Change type: capability
Severity: info

Google Cloud Compute Engine Emerald Rapids C4 CPUs outperform N2 for embedding and generation with favorable TCO

More from Hugging Face

Get alerts for Hugging Face