InfoCapability

YAQA: Weight-only post-training quantization preserves outputs with Hessian-based FIM approach

AI Impact Summary

YAQA introduces a weight-only post-training quantization method that preserves the original model’s outputs, and it is quantizer-agnostic, compatible with hardware datatypes and memory-bound quantizers like QTIP. It replaces layerwise activation error minimization with a Kronecker-factored Fisher Information (Hessian) approach to minimize KL divergence, using two Hessian sketch strategies to enable one-pass computation and scale with distributed setups such as FSDP. Across models and quantizers, YAQA reduces KL divergence to the original model by over 30% compared to LDLQ, GPTQ, and AWQ, delivering state-of-the-art downstream performance for quantized LLMs.

Affected Systems

YAQAQTIP

Date: Date not specified
Change type: capability
Severity: info

YAQA: Weight-only post-training quantization preserves outputs with Hessian-based FIM approach

More from Together AI

Get alerts for Together AI