Phi-2 4-bit quantization on Intel Meteor Lake enables on-device LLM inference
AI Impact Summary
This post documents running the Microsoft Phi-2 2.7B model on an Intel Meteor Lake-based laptop by quantizing weights to 4-bit (ratio 0.8) using OpenVINO via Optimum Intel. Inference is performed with Hugging Face transformers (OVModelForCausalLM) on a mid-range Core Ultra CPU, leveraging CPU/XVE vector units and the NPU for acceleration. This approach enables private, offline LLM use with reduced latency and cloud costs, but accuracy and performance will be sensitive to quantization settings and hardware specifics, requiring validation for production workloads.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info