Hugging Face: Vision Language Models enable any-to-any modalities and MoE decoders (Qwen 2.5 Omni, Kimi-VL-A3B-Thinking) | SignalBreak | SignalBreak