InfoCapability

Running Mistral 7B on-device with Core ML (WWDC 24)

AI Impact Summary

WWDC24 demonstrates on-device LLM inference on Apple Silicon using Core ML, enabling private, low-latency AI within apps. The workflow leverages a fork of swift-transformers to port a 7B model to Core ML, employing new APIs like MLTensor and StatefulBuffers, kv-cache, and block-wise 4-bit quantization to fit within roughly 4GB RAM on Macs. The article outlines concrete steps to reproduce: clone the preview branch of swift-transformers, download converted Core ML models from Hugging Face, and run inference via Swift, indicating a practical on-device migration path for 7B-scale models on macOS/iOS 18+. This capability paves the way for privacy-preserving, offline AI features in consumer apps, but requires Apple Silicon hardware and updated Core ML tooling across the stack.

Affected Systems

Mistral 7BCore ML

Date: Date not specified
Change type: capability
Severity: info

Running Mistral 7B on-device with Core ML (WWDC 24)

More from Hugging Face

Get alerts for Hugging Face