Cohere releases Aya Vision 8B and 32B open-weight multilingual vision-language models
AI Impact Summary
Cohere announces the Aya Vision family with open-weight multilingual vision-language models in 8B and 32B sizes, targeting cross-language multimodal understanding across 23 languages. The architecture uses dynamic image tiling for high-resolution inputs, SigLIP2-patch14-384 for vision encoder initialization, and Pixel Shuffle downsampling to reduce token counts before the vision-language connector and LLM decoder. Aya Vision also ships open benchmarks (AyaVisionBench and mWildVision) to standardize evaluation across languages, enabling faster experimentation and benchmarking against peers. Open weights lower barriers for research and early product teams, but production deployment will demand careful resource planning for fine-tuning, model merging, and serving at scale.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info