OpenVINO NNCF-based optimization of Stable Diffusion on Intel CPUs with ToME and Diffusers
AI Impact Summary
The article describes a CPU-centric optimization workflow for Stable Diffusion using OpenVINO, NNCF, Diffusers, and Token Merging (ToME). It explains that UNet is the bottleneck and that traditional post-training 8-bit quantization is insufficient, necessitating Quantization-Aware Training with knowledge distillation and EMA to preserve accuracy. Reported results show substantial CPU inference speedups (up to 5.1x with ToME + 8-bit) and a 4x reduction in model footprint versus PyTorch, enabling feasible edge/CPU deployments on Intel Xeon with Deep Learning Boost. This signals a viable migration path for CPU-only inference pipelines, but requires a careful training and tuning workflow to maintain image quality across target prompts and steps.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info