Cohere releases Aya Vision 8B/32B open-weight multilingual vision-language models with AyaVisionBench and mWildVision benchmarks
AI Impact Summary
Aya Vision introduces 8B and 32B open-weight multilingual vision-language models across 23 languages, enabling researchers and products to embed multilingual multimodal capabilities at scale. The two-stage training (vision-language alignment followed by supervised fine-tuning) and a model-merging workflow deliver stronger cross-lingual and multimodal performance, as demonstrated by AyaVisionBench and mWildVision benchmarks. Technical details such as SigLIP2-patch14-384 initialization, dynamic image tiling, and Pixel Shuffle downsampling address high-resolution image processing and deployment efficiency, but will require careful integration with existing pipelines and LLM decoders.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info