InfoCapability

Cohere releases Aya Vision 8B and 32B open-weight multilingual vision-language models

AI Impact Summary

Cohere announces the Aya Vision family with open-weight multilingual vision-language models in 8B and 32B sizes, targeting cross-language multimodal understanding across 23 languages. The architecture uses dynamic image tiling for high-resolution inputs, SigLIP2-patch14-384 for vision encoder initialization, and Pixel Shuffle downsampling to reduce token counts before the vision-language connector and LLM decoder. Aya Vision also ships open benchmarks (AyaVisionBench and mWildVision) to standardize evaluation across languages, enabling faster experimentation and benchmarking against peers. Open weights lower barriers for research and early product teams, but production deployment will demand careful resource planning for fine-tuning, model merging, and serving at scale.

Affected Systems

Aya Vision 8BAya Vision 32B

Date: Date not specified
Change type: capability
Severity: info

Cohere releases Aya Vision 8B and 32B open-weight multilingual vision-language models

More from Hugging Face

Get alerts for Hugging Face