Google Cloud Compute Engine Emerald Rapids C4 CPUs outperform N2 for embedding and generation with favorable TCO
AI Impact Summary
Google Cloud Compute Engine C4 (Emerald Rapids) CPUs deliver substantial throughputs over N2 (Ice Lake) for agentic AI workloads, with embedding throughput 10x-24x and generation 2.3x-3.6x. When accounting for price (~1.3x hourly for C4), the C4 shows a 7x-19x TCO advantage for embedding and 1.7x-2.9x for generation, implying CPU-only hosting of lightweight agentic AI pipelines can be cost-effective at scale. The benchmark uses optimum-benchmark with optimum-intel on models WhereIsAI/UAE-Large-V1 for embedding and meta-llama/Llama-3.2-3 for generation, with NUMA binding and ipex optimizations, highlighting concrete deployment parameters and potential gains from newer Granite Rapids generations for Llama-3 workloads.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info