InfoCapability

On-device OCR with CoreML and MLX: converting dots.ocr (3B) for iOS

AI Impact Summary

This article details converting the 3B-parameter OCR model dots.ocr to run on-device on Apple devices, using CoreML for the vision encoder and MLX for the LM backbone. It outlines a PyTorch-to-CoreML workflow (torch.jit.trace/torch.export) with coremltools, starting on GPU FLOAT32 with static shapes and iteratively addressing conversion blockers (dtype mismatches, repeat_interleave, masking, dynamic loops) to achieve Neural Engine compatibility. The outcome enables offline, on-device OCR with zero network calls and API keys, but requires substantial engineering to simplify the model, constrain to single-image inference, and validate performance on iOS devices.

Affected Systems

dots.ocrCoreML

Date: Date not specified
Change type: capability
Severity: info

On-device OCR with CoreML and MLX: converting dots.ocr (3B) for iOS

More from Hugging Face

Get alerts for Hugging Face