YAQA: Model-Preserving Adaptive Rounding for LLMs Released
Action Required
Organizations can deploy more accurate LLMs on resource-constrained hardware, reducing inference costs and improving model performance.
AI Impact Summary
This announcement details the release of YAQA, a novel quantization algorithm for Large Language Models (LLMs) that preserves model outputs with >30% KL divergence reduction compared to existing rounding methods. The core innovation lies in using a Kronecker-factored approximation of the model's Hessian to directly minimize the KL divergence, offering a significant improvement in model accuracy after quantization. This capability is particularly valuable for deploying LLMs on resource-constrained hardware due to its agnostic nature and efficient computation of the Hessian.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high