ChatGPT adds vision, speech input, and audio output capabilities
AI Impact Summary
ChatGPT now supports multimodal input and output: vision (image analysis), audio input (speech-to-text), and audio output (text-to-speech). This expands the API surface beyond text-only interactions, enabling voice-first applications and visual document processing. Teams using ChatGPT integrations must evaluate whether these new modalities create new security boundaries (audio data handling, voice synthesis quality) or unlock new product capabilities.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium