Google Gemini / Vertex AIAzure OpenAIReplicateTogether AIElasticHugging FaceAI21 LabsPerplexityAWS Bedrock

OpenAI Forces GPT-3.5 Turbo Migration as Google Cuts Vertex AI Endpoints

30 Jun 2025 – 7 Jul 20255 min read

OpenAI Forces GPT-3.5 Turbo Migration as Google Cuts Vertex AI Endpoints

The AI provider landscape shifted dramatically this week with OpenAI finally pulling the plug on GPT-3.5 Turbo and Google simultaneously culling multiple Vertex AI endpoints. Both deprecations hit on 30 June 2025, creating a perfect storm of forced migrations that will impact thousands of production applications.

What's changing with OpenAI's GPT-3.5 Turbo deprecation?

OpenAI has officially retired GPT-3.5 Turbo as of 30 June 2025, marking the end of an era for one of the most widely deployed language models in production. This isn't just another routine deprecation - GPT-3.5 Turbo powered countless applications across chatbots, content generation, and summarisation workflows, making this sunset particularly disruptive.

The migration path is clear but not trivial: applications must move to GPT-4o-mini or newer models. Whilst GPT-4o-mini offers superior performance and capabilities, the transition requires careful consideration of cost implications and potential behaviour changes. GPT-4o-mini operates differently from its predecessor, particularly in instruction following and output formatting, meaning teams can't simply swap endpoints and expect identical results.

The timing creates additional pressure. With the deprecation effective immediately, any applications still calling GPT-3.5 Turbo endpoints will receive errors. This affects not just direct OpenAI API users but also applications built on platforms like Together AI, Elastic, and Hugging Face that offered GPT-3.5 Turbo access through their services. The ripple effect extends across the entire AI ecosystem, forcing coordinated updates across multiple provider integrations.

Why Google's Vertex AI endpoint cuts matter more than you think

Google's decision to deprecate multiple image and video generation endpoints in Vertex AI represents a significant consolidation of their generative AI offerings. The affected endpoints include the imagegeneration and veo models, forcing users to migrate to newer alternatives like gemini-2.5-flash-image or veo-3.1-generate-001.

This move signals Google's strategic shift towards unified model families rather than maintaining separate specialised endpoints. Whilst the migration paths exist, they're not straightforward replacements. The newer Gemini-based image generation models operate with different parameter structures and capabilities, requiring code changes beyond simple endpoint swaps.

The timing coincides with Google's broader Vertex AI Model Garden cleanup, which also saw Claude 3 Opus and Mistral Nemo removed from the platform. This coordinated deprecation suggests Google is streamlining its third-party model offerings, potentially signalling a focus on first-party Gemini models over external partnerships. For developers who built workflows around these specific model combinations, the changes require fundamental architecture decisions about which providers to standardise on.

Worth watching: capability expansions and platform improvements

Azure OpenAI introduced input fidelity controls and partial image streaming to their image APIs, addressing long-standing developer requests for more granular control over image generation. The input_fidelity parameter allows fine-tuning for specific use cases like preserving facial features and brand identity, whilst partial streaming enables real-time feedback during generation. These enhancements don't require immediate action but offer significant workflow improvements for teams building image-heavy applications.

Replicate removed their monthly spend limit feature, eliminating a key cost control mechanism that many users relied on to manage unexpected charges. This change forces users to implement alternative budget management strategies, either through external monitoring or by contacting Replicate support for custom solutions. The removal suggests Replicate is moving towards more sophisticated billing models, though the immediate impact leaves cost-conscious users without their primary spending safeguard.

Anthropic's Claude Opus 4 achieved general availability on Vertex AI, expanding geographic access without requiring immediate changes from existing users. This represents positive momentum for Anthropic's enterprise adoption, particularly as organisations seek alternatives to OpenAI models. The expanded availability through Google's infrastructure provides additional deployment options for teams already invested in the Google Cloud ecosystem.

AI21 Labs released Jamba 1.7 with enhanced grounding capabilities and self-hosted deployment options. The update directly improves the model's ability to answer questions and follow instructions whilst providing greater control for organisations with strict data privacy requirements. The self-hosted option addresses a growing enterprise demand for on-premises AI deployments, positioning Jamba as a viable alternative for security-conscious implementations.

Replicate expanded their deployment metrics view from 2 hours to 24 hours with 15-minute aggregation intervals. This enhancement provides much better visibility into deployment trends and stability patterns without requiring user intervention. The change addresses previous limitations where short monitoring windows made it difficult to identify longer-term performance issues.

Quick hits

Perplexity added SEC filings filtering for financial research and detailed cost breakdowns per API request. AWS Bedrock expanded Prompt Management and Flows to four new regions including Milan, Spain, Hyderabad, and Osaka. Vertex AI Agent Garden gained tag filtering support for improved agent discovery. Replicate shipped documentation improvements, environment variable support, and playground bug fixes.

The week ahead: critical migration deadlines

With both OpenAI and Google deprecations already in effect, teams should prioritise immediate impact assessment and emergency migration planning. The GPT-3.5 Turbo sunset affects applications across multiple platforms, requiring coordinated updates across potentially dozens of integrations.

For Google Vertex AI users, the image and video generation endpoint changes demand careful testing of replacement models before production deployment. The newer Gemini-based alternatives offer different capabilities and parameter structures that may require workflow adjustments.

The simultaneous timing of these major deprecations creates resource allocation challenges for development teams. Organisations using both OpenAI and Google services face competing migration priorities, making this week critical for establishing clear remediation timelines and testing protocols.