Hugging Face Hub adopts CDC-based chunking with Xet-backed storage to reduce Git LFS storage and transfers
AI Impact Summary
Hugging Face is replacing file-level LFS storage with content-defined chunking (CDC) and a content-addressed store via an Xet-backed storage backend, enabling only modified chunks to be uploaded and deduplicated across versions. This reduces storage and bandwidth needs for large assets (e.g., Safetensor ~1 GB, GGUF >8 GB) and speeds up iteration by avoiding full reuploads. Benchmark results against Git LFS show ~50% gains in storage and transfer performance, with a two-version example of model.safetensors yielding ~53% storage reduction and additional savings from compression. Xet-backed repositories are planned for early 2025, indicating a staged rollout and migration path for teams storing large models and datasets.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info