InfoCapability

WWDC 24: Running Mistral 7B on-device with Core ML

AI Impact Summary

WWDC24 showcases on-device LLM deployment by running Mistral 7B through Core ML on Apple Silicon, enabling private, low-latency inference within iOS 18 and macOS Sequoia. It highlights a full toolchain: a fork of swift-transformers, converted Core ML models, and the Swift CLI to run inference, plus developments like MLTensor, StatefulBuffers, and kv-cache that reduce memory bandwidth pressure for 7B models. For technical teams, this indicates a viable on-device path for mid-sized LLMs, with memory footprints around 4GB on Mac hardware and opportunities to cut cloud/latency costs, provided you manage model conversion, quantization (4-bit), and Core ML tooling compatibility. Migration considerations include adopting the Core ML conversion workflow, using the new Swift tensor APIs, and supporting Apple Silicon capabilities (CPU/GPU/Neural Engine) across iOS 18/macOS Sequoia.

Affected Systems

Mistral 7B

Date: Date not specified
Change type: capability
Severity: info

WWDC 24: Running Mistral 7B on-device with Core ML

More from Hugging Face

Get alerts for Hugging Face