Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
AI Impact Summary
Chipmunk introduces a training-free acceleration method for diffusion transformers by dynamically computing sparse "deltas" against cached attention and MLP activations. This approach leverages the slow-changing and sparse nature of DiT activations, combined with a hardware-aware sparsity pattern that aligns with GPU tile sizes, achieving up to 3.7x faster video generation. This technique is particularly relevant for teams working with large video generation models like HunyuanVideo and FLUX.1-dev, where reducing compute time is critical.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info