Chipmunk: Training-Free Acceleration of Diffusion Transformers
Action Required
Organizations can significantly reduce the computational cost and latency of diffusion transformer models, enabling faster video generation and image processing applications.
AI Impact Summary
Chipmunk introduces a training-free method for accelerating diffusion transformers by dynamically computing sparse "deltas" against cached attention and MLP activations. This approach leverages the slow-changing and sparse nature of DiT activations and hardware-aware sparsity patterns to achieve significant speedups – up to 3.7x for video generation with HunyuanVideo and 1.6x for image generation on FLUX.1-dev. The key innovation is using a column-sparse attention and MLP kernel, enabling efficient computation with reduced memory access, and achieving up to 93% sparsity.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high