Google PaliGemma 2 Mix: new vision-language models (3B/10B/28B) across multiple resolutions
AI Impact Summary
Google introduces PaliGemma 2 Mix, a fine-tuned subset of the PaliGemma 2 family built on SigLIP and Gemma 2, available in 3B, 10B, and 28B sizes across 224x224, 448x448, and 896x896 resolutions. The mix variants act as quick proxies for downstream fine-tuning on vision-language tasks such as OCR, captioning, and VQA, helping teams gauge expected performance without task-specific training. This reinforces the model family’s emphasis on transfer learning for downstream tasks rather than general-purpose chat, with explicit demonstrations of capabilities across subtasks and prompts like “ocr” and “caption.” For engineering teams, the release provides concrete model identifiers and task prompts to benchmark performance and plan resource allocation across different sizes and resolutions.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info