InfoCapability

Phi-2 4-bit quantization on Intel Meteor Lake enables on-device LLM inference

AI Impact Summary

This post documents running the Microsoft Phi-2 2.7B model on an Intel Meteor Lake-based laptop by quantizing weights to 4-bit (ratio 0.8) using OpenVINO via Optimum Intel. Inference is performed with Hugging Face transformers (OVModelForCausalLM) on a mid-range Core Ultra CPU, leveraging CPU/XVE vector units and the NPU for acceleration. This approach enables private, offline LLM use with reduced latency and cloud costs, but accuracy and performance will be sensitive to quantization settings and hardware specifics, requiring validation for production workloads.

Affected Systems

Microsoft Phi-2Intel OpenVINO

Date: Date not specified
Change type: capability
Severity: info

Phi-2 4-bit quantization on Intel Meteor Lake enables on-device LLM inference

More from Hugging Face

Get alerts for Hugging Face