HighCapability

Chipmunk: Training-Free Acceleration of Diffusion Transformers

Action Required

Organizations can significantly reduce the computational cost and latency of diffusion transformer models, enabling faster video generation and image processing applications.

AI Impact Summary

Chipmunk introduces a training-free method for accelerating diffusion transformers by dynamically computing sparse "deltas" against cached attention and MLP activations. This approach leverages the slow-changing and sparse nature of DiT activations and hardware-aware sparsity patterns to achieve significant speedups – up to 3.7x for video generation with HunyuanVideo and 1.6x for image generation on FLUX.1-dev. The key innovation is using a column-sparse attention and MLP kernel, enabling efficient computation with reduced memory access, and achieving up to 93% sparsity.

Affected Systems

Diffusion Transformers

Date: Date not specified
Change type: capability
Severity: high

Chipmunk: Training-Free Acceleration of Diffusion Transformers

More from Together AI

Get alerts for Together AI