MediumCapability

ChatGPT enables see, hear, and speak with multimodal input and speech output

AI Impact Summary

ChatGPT now supports multimodal interaction, enabling users to supply images and audio and receive spoken responses. This expands capabilities for visual data analysis, voice-enabled assistants, and accessibility use cases, which will drive new integration opportunities in customer support and internal automation. Technical teams should plan for image/audio data handling, consent and privacy controls, latency considerations, and moderation for uploaded content, as well as UI changes to capture new input modalities.

Affected Systems

ChatGPT

Business Impact

Enables multimodal workflows (image/audio input and voice output) for customer support and accessibility, but requires privacy controls, data handling policies, and latency management.

Date: Date not specified
Change type: capability
Severity: medium

ChatGPT enables see, hear, and speak with multimodal input and speech output

More from OpenAI

Get alerts for OpenAI