InfoCapability

YAQA: Model-Preserving Adaptive Rounding with YAQA

AI Impact Summary

YAQA introduces a novel model-preserving adaptive rounding algorithm for LLM quantization, leveraging a Kronecker-factored approximation of the linear layer Hessian to minimize KL divergence. This approach, utilizing Hessian sketches and power iteration, achieves >30% reduction in KL divergence compared to existing rounding methods like LDLQ, GPTQ, and AWQ. The algorithm’s ability to adaptively adjust quantization based on Hessian feedback represents a significant advancement in post-training quantization, offering improved model accuracy and performance.

Affected Systems

YAQA

Business Impact

This new quantization method enables the deployment of more accurate and efficient LLMs, potentially reducing inference costs and improving the performance of applications relying on these models.

Date: Date not specified
Change type: capability
Severity: info

YAQA: Model-Preserving Adaptive Rounding with YAQA

More from Together AI

Get alerts for Together AI