InfoCapability

Nyströmformer: Approximating self-attention in linear time using the Nyström method

AI Impact Summary

Nyströmformer introduces a Nyström-based approximation of the self-attention matrix by constructing query and key landmarks from segmented token means, yielding an attention computation with O(n) complexity. This enables training and inference on longer input sequences (n up to ~4096–8192) with reduced memory, supporting longer-context NLP and CV tasks in HuggingFace/Transformers models. Deployments should plan around landmark sizing (commonly 32 or 64) and integration with masking, normalization, and depthwise convolution skip connections, as the reference implementation omits some details. The HuggingFace code path demonstrates a practical integration in PyTorch-based pipelines, making migration feasible for teams evaluating long-sequence models.

Affected Systems

NyströmformerHuggingFace Transformers

Date: Date not specified
Change type: capability
Severity: info

Nyströmformer: Approximating self-attention in linear time using the Nyström method

More from Hugging Face

Get alerts for Hugging Face