On-device OCR with CoreML and MLX: converting dots.ocr (3B) for iOS
AI Impact Summary
This article details converting the 3B-parameter OCR model dots.ocr to run on-device on Apple devices, using CoreML for the vision encoder and MLX for the LM backbone. It outlines a PyTorch-to-CoreML workflow (torch.jit.trace/torch.export) with coremltools, starting on GPU FLOAT32 with static shapes and iteratively addressing conversion blockers (dtype mismatches, repeat_interleave, masking, dynamic loops) to achieve Neural Engine compatibility. The outcome enables offline, on-device OCR with zero network calls and API keys, but requires substantial engineering to simplify the model, constrain to single-image inference, and validate performance on iOS devices.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info