InfoCapability

Memory-efficient Diffusion Transformers with Quanto and Diffusers — FP8 quantization for PixArt-Sigma and Stable Diffusion 3

AI Impact Summary

The article demonstrates that Transformer-based diffusion backbones (0.6B–8B params) suffer from high memory usage when combined with multiple text encoders (e.g., Stable Diffusion 3). By applying Quanto quantization within Diffusers, researchers achieve meaningful memory savings (FP8, qint8) with minimal quality loss, and the biggest gains come from quantizing text encoders, which is crucial when pipelines include several encoders. This enables running larger diffusion models on consumer GPUs (e.g., FP16/H100 setups) and reduces iteration time for experimentation, though there are latency tradeoffs and potential quality impacts under aggressive quantization, plus migration steps to quantize and freeze components.

Affected Systems

PixArt-SigmaStable Diffusion 3

Date: Date not specified
Change type: capability
Severity: info

Memory-efficient Diffusion Transformers with Quanto and Diffusers — FP8 quantization for PixArt-Sigma and Stable Diffusion 3

More from Hugging Face

Get alerts for Hugging Face