YAQA: Model-Preserving Adaptive Rounding with YAQA
AI Impact Summary
YAQA introduces a novel model-preserving adaptive rounding algorithm for LLM quantization, leveraging a Kronecker-factored approximation of the linear layer Hessian to minimize KL divergence. This approach, utilizing Hessian sketches and power iteration, achieves >30% reduction in KL divergence compared to existing rounding methods like LDLQ, GPTQ, and AWQ. The algorithm’s ability to adaptively adjust quantization based on Hessian feedback represents a significant advancement in post-training quantization, offering improved model accuracy and performance.
Affected Systems
Business Impact
This new quantization method enables the deployment of more accurate and efficient LLMs, potentially reducing inference costs and improving the performance of applications relying on these models.
- Date
- Date not specified
- Change type
- capability
- Severity
- info