InfoCapability

Hugging Face introduces natively supported quantization schemes in Transformers

AI Impact Summary

Hugging Face has introduced natively supported quantization schemes for Transformers models, primarily through bitsandbytes and auto-GPTQ. This allows users to run large models on devices with limited resources by reducing model size and memory footprint. Currently, bitsandbytes offers easier quantization with zero-shot quantization and cross-modality interoperability, while auto-GPTQ provides faster inference speeds for text generation, particularly with n-bit support. These options are supported for PyTorch models only, and further development is planned for Tensorflow and Flax/JAX.

Affected Systems

bitsandbytesauto-gptq

Date: Date not specified
Change type: capability
Severity: info

Hugging Face introduces natively supported quantization schemes in Transformers

More from Hugging Face

Get alerts for Hugging Face