MediumCapability

ChatGPT adds multimodal capabilities: vision, audio input, and speech output

AI Impact Summary

ChatGPT now supports multimodal inputs and outputs, enabling interpretation of images and audio with spoken responses. For technical teams, this expands input modalities and interaction patterns that can be embedded in assistants, consumer apps, and internal tools, potentially improving task accuracy and user satisfaction in visually-rich or hands-free contexts. Considerations include privacy and moderation for user-provided media, latency for real-time dialogue, and alignment with existing consent and data-retention policies.

Affected Systems

ChatGPT

Business Impact

Organizations can deploy ChatGPT-powered assistants that interpret images and audio and respond verbally, enabling richer customer support, accessibility features, and hands-free workflows, but will require updated data governance, consent, and moderation policies for multimodal content.

Date: Date not specified
Change type: capability
Severity: medium

ChatGPT adds multimodal capabilities: vision, audio input, and speech output

More from OpenAI

Get alerts for OpenAI