Hugging Face introduces Xet: Content-Defined Chunking for Improved Storage Efficiency
AI Impact Summary
Hugging Face is introducing a new storage approach, Xet, that utilizes content-defined chunking (CDC) to dramatically improve storage efficiency and iteration speed. By breaking files into variable-sized chunks based on rolling hash algorithms, Xet minimizes redundant storage and transfer costs, particularly for frequently updated models like PyTorch checkpoints (200 TB) and datasets. This change will reduce storage costs and improve iteration speed for users working with evolving models and datasets.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info