Hugging Face introduces natively supported quantization schemes in Transformers
AI Impact Summary
Hugging Face has introduced natively supported quantization schemes for Transformers models, primarily through bitsandbytes and auto-GPTQ. This allows users to run large models on devices with limited resources by reducing model size and memory footprint. Currently, bitsandbytes offers easier quantization with zero-shot quantization and cross-modality interoperability, while auto-GPTQ provides faster inference speeds for text generation, particularly with n-bit support. These options are supported for PyTorch models only, and further development is planned for Tensorflow and Flax/JAX.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info