Google Cloud Compute Engine C4 Emerald Rapids CPUs outperform N2 for embedding and generation workloads
AI Impact Summary
Benchmark compares Google Cloud Compute Engine N2 (Ice Lake) and C4 (Emerald Rapids with AMX) CPU instances using optimum-benchmark and optimum-intel to measure text embedding and text generation workloads. C4 delivers 10x-24x higher throughput for embedding and 2.3x-3.6x higher throughput for generation, with C4 hourly cost about 1.3x N2, yielding 7x-19x embedding and 1.7x-2.9x generation total cost of ownership in the tested ranges. This suggests CPU-only deployments of lightweight agentic AI stacks are viable at scale, but real-world results depend on model choice (e.g., WhereIsAI/UAE-Large-V1, meta-llama/Llama-3.2-3) and config.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info