InfoCapability

Google PaliGemma 2 Mix: new vision-language models (3B/10B/28B) across multiple resolutions

AI Impact Summary

Google introduces PaliGemma 2 Mix, a fine-tuned subset of the PaliGemma 2 family built on SigLIP and Gemma 2, available in 3B, 10B, and 28B sizes across 224x224, 448x448, and 896x896 resolutions. The mix variants act as quick proxies for downstream fine-tuning on vision-language tasks such as OCR, captioning, and VQA, helping teams gauge expected performance without task-specific training. This reinforces the model family’s emphasis on transfer learning for downstream tasks rather than general-purpose chat, with explicit demonstrations of capabilities across subtasks and prompts like “ocr” and “caption.” For engineering teams, the release provides concrete model identifiers and task prompts to benchmark performance and plan resource allocation across different sizes and resolutions.

Affected Systems

PaliGemma 2 Mix

Date: Date not specified
Change type: capability
Severity: info

Google PaliGemma 2 Mix: new vision-language models (3B/10B/28B) across multiple resolutions

More from Hugging Face

Get alerts for Hugging Face