Arm ExecuTorch 0.7 enables on-device GenAI via KleidiAI on billions of Arm devices
AI Impact Summary
Arm is enabling on-device GenAI at scale by default through ExecuTorch 0.7 with KleidiAI, delivering automatic acceleration across edge frameworks like XNNPack, MediaPipe, MNN, ONNX Runtime, and llama.cpp. It leverages SDOT and I8MM on Armv8.2+/v8.6+ to accelerate Int8/Int4 matrix multiplications, enabling Llama 3.2 1B on billions of Arm-based devices with claimed gains (e.g., ~20% higher prefill, 350 tokens/s prefill, 40 tokens/s decode on Galaxy S24+). This broadens on-device GenAI to Android devices and edge platforms such as Raspberry Pi 5, supporting private offline tasks like local chat and context-aware editing; teams should plan integration with ExecuTorch + KleidiAI and verify SDOT/I8MM support on target devices.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info