Accelerating Stable Diffusion Inference on Intel Sapphire Rapids CPUs
AI Impact Summary
This document details techniques to accelerate Stable Diffusion inference on Intel Sapphire Rapids CPUs, primarily leveraging OpenVINO and system-level optimizations. The core findings demonstrate a significant performance boost, achieving a 10x speedup compared to Ice Lake Xeons through OpenVINO’s bfloat16 optimization and dynamic shape support. Further acceleration is achieved via jemalloc, libiomp, and the Intel Extension for PyTorch (IPEX) which utilizes AVX-512 and AMX, resulting in a final latency of 4.7 seconds for a single image generation.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info