HighCapability

YAQA: Model-Preserving Adaptive Rounding for LLMs Released

Action Required

Organizations can deploy more accurate LLMs on resource-constrained hardware, reducing inference costs and improving model performance.

AI Impact Summary

This announcement details the release of YAQA, a novel quantization algorithm for Large Language Models (LLMs) that preserves model outputs with >30% KL divergence reduction compared to existing rounding methods. The core innovation lies in using a Kronecker-factored approximation of the model's Hessian to directly minimize the KL divergence, offering a significant improvement in model accuracy after quantization. This capability is particularly valuable for deploying LLMs on resource-constrained hardware due to its agnostic nature and efficient computation of the Hessian.

Affected Systems

LLMs

Date: Date not specified
Change type: capability
Severity: high

YAQA: Model-Preserving Adaptive Rounding for LLMs Released

More from Together AI

Get alerts for Together AI