Nyströmformer: Approximating self-attention in linear time using the Nyström method
AI Impact Summary
Nyströmformer introduces a Nyström-based approximation of the self-attention matrix by constructing query and key landmarks from segmented token means, yielding an attention computation with O(n) complexity. This enables training and inference on longer input sequences (n up to ~4096–8192) with reduced memory, supporting longer-context NLP and CV tasks in HuggingFace/Transformers models. Deployments should plan around landmark sizing (commonly 32 or 64) and integration with masking, normalization, and depthwise convolution skip connections, as the reference implementation omits some details. The HuggingFace code path demonstrates a practical integration in PyTorch-based pipelines, making migration feasible for teams evaluating long-sequence models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info