YAQA: Weight-only post-training quantization preserves outputs with Hessian-based FIM approach
AI Impact Summary
YAQA introduces a weight-only post-training quantization method that preserves the original model’s outputs, and it is quantizer-agnostic, compatible with hardware datatypes and memory-bound quantizers like QTIP. It replaces layerwise activation error minimization with a Kronecker-factored Fisher Information (Hessian) approach to minimize KL divergence, using two Hessian sketch strategies to enable one-pass computation and scale with distributed setups such as FSDP. Across models and quantizers, YAQA reduces KL divergence to the original model by over 30% compared to LDLQ, GPTQ, and AWQ, delivering state-of-the-art downstream performance for quantized LLMs.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info