InfoCapability

SDXL inference optimizations in Diffusers: FP16, SDPA, and torch.compile

AI Impact Summary

This SDXL optimization briefing demonstrates practical inference-speed and memory-reduction strategies via diffusers for stabilityai/stable-diffusion-xl-base-1.0. It shows baseline ~28GB memory and ~72s per 4-image batch with full precision, then FP16 reduces to ~21.7GB and 14.8s, and SDPA keeps memory the same while dropping to 11.4s. Adding torch.compile lowers the time further to about 10.2s, with the caveat that the first compile is slow but subsequent runs are faster. It also covers CPU offloading options (CPU offload and sequential offload) that shave memory to ~20.2GB or ~19.9GB at the expense of latency, with sequential offload pushing latency to around 67s.

Affected Systems

Stable Diffusion XL (SDXL)StableDiffusionXLPipeline

Date: Date not specified
Change type: capability
Severity: info

SDXL inference optimizations in Diffusers: FP16, SDPA, and torch.compile

More from Hugging Face

Get alerts for Hugging Face