Google Forces Vertex AI Migration as Critical Deadlines Loom
Google Forces Vertex AI Migration as Critical Deadlines Loom
Google is forcing a major migration across its Vertex AI platform, with multiple critical deprecations hitting this week alongside Cohere's retirement of key embedding models. If you're running production workloads on these platforms, you've got days, not weeks, to act.
What's changing with Vertex AI endpoints?
Google has announced the deprecation of several critical Vertex AI endpoints, creating an urgent migration scenario for developers. The video generation preview endpoints for Imagen and Veo are being sunset on 2 April 2026, forcing immediate migration to production-ready Veo models. Simultaneously, the Vertex AI Workbench v2 is receiving a major update (M140) that migrates to Debian 12 and Python 3.12 whilst removing support for JupyterLab 3, TensorFlow, and PyTorch frameworks.
This isn't just a simple version bump. The removal of framework support means existing notebooks and workflows will break unless updated. The new date-based versioning scheme signals Google's shift towards more frequent, potentially disruptive updates. Teams running ML pipelines on Workbench v2 need to audit their dependencies immediately and test compatibility with the new Python 3.12 environment.
The timing is particularly challenging given the simultaneous endpoint deprecations. Applications relying on the deprecated video generation endpoints will simply stop working after 2 April 2026. There's no graceful degradation here - it's migrate or fail. The recommended migration path to production Veo models requires code changes and potentially different pricing structures, making this more than a simple endpoint swap.
Cohere forces Embed v2.0 migration by Friday
Cohere is retiring its Embed v2.0 family and Aya Expanse 8B models on 4 April 2026, creating another critical deadline this week. This affects embed-english-v2.0, embed-english-light-v2.0, embed-multilingual-v2.0, and the c4ai-aya-expanse-8b and c4ai-aya-vision-8b models.
The migration path leads to embed-english-v3.0, embed-multilingual-v3.0, and the newer embed-v4.0 models, alongside command-r7b-12-2024 for language tasks. However, this isn't a drop-in replacement scenario. The newer embedding models have different dimensionalities and performance characteristics, meaning existing vector databases and similarity search implementations may need recalibration.
For production systems, this represents a significant operational risk. Vector embeddings are often deeply integrated into search, recommendation, and RAG systems. Changing the underlying embedding model can affect search relevance, similarity thresholds, and downstream application behaviour. Teams need to run parallel testing with the new models before the Friday deadline to ensure consistent performance.
Weaviate delivers major performance improvements
Whilst others are forcing migrations, Weaviate is delivering substantial improvements to its vector database platform. The release of version 1.36.9 introduces significant HNSW optimisations, including a new sparse implementation for visited lists and enhanced dequeuing during backups. More importantly, v1.37.0-rc.0 brings extensible tokenizers, incremental backups, and the ability to drop vector indices from schemas.
The 'Alter Schema - Drop vector index' feature addresses a long-standing operational pain point. Previously, removing vector indices required complex workarounds or complete schema rebuilds. This new capability allows for more efficient storage management and faster query performance by removing unused indices without disrupting active workloads.
The extensible tokenizer system is particularly significant for multilingual deployments. Custom tokenizers can now be plugged into the system, enabling better handling of domain-specific vocabularies and non-English languages. Combined with the incremental backup improvements, this positions Weaviate as a more enterprise-ready platform for production vector search workloads.
Worth watching this week
Google releases Gemma 4 multimodal models: DeepMind has released the Gemma 4 family with support for image, text, and audio inputs. The models come with Apache 2 licences and leverage architectural innovations like Per-Layer Embeddings for improved efficiency. This represents a significant step towards truly open frontier AI models, though migration from existing Gemma versions will require code changes and infrastructure adjustments.
Together AI launches Wan 2.7 video suite: A four-model video generation platform providing enhanced control over video creation, continuation, and editing workflows. The rollout begins with text-to-video capabilities, offering improved resolution and duration controls compared to fragmented existing solutions.
LocalAI reaches production readiness: Version 4.1.0 introduces distributed cluster support, user authentication, and fine-tuning capabilities. The addition of smart routing and autoscaling makes this a viable alternative for organisations seeking to deploy AI models without cloud dependencies.
AWS expands CloudWatch auto-enablement: CloudFront and three additional resource types now support automatic logging configuration, reducing operational overhead for infrastructure monitoring.
Anthropic increases Claude token limits: The Message Batches API now supports up to 300,000 tokens on Claude Opus and Sonnet models, enabling longer-form content generation and complex reasoning tasks.
Quick hits
• Microsoft introduces pay-as-you-go Codex pricing through ChatGPT Business and Enterprise • Amazon SageMaker Data Agent adds charting and materialised view support for advanced data analysis • Bedrock Guardrails expands cross-account safeguards for consistent security policies • Deepgram speech models integrate natively with Together AI for real-time voice applications • Aurora DSQL launches .NET and Rust connectors expanding language ecosystem support • OpenAI secures $122 billion in new funding indicating continued aggressive AI investment
The week ahead: critical deadlines approaching
The immediate priority is the 4 April deadline for Cohere's Embed v2.0 retirement. Teams using these models have until Friday to complete their migration or face service disruption. Google's Vertex AI video generation endpoint deprecation hits 2 April, making this an equally urgent priority.
Longer-term, Google's Gemini 2.5 model retirement on 16 October 2026 requires strategic planning. Applications using Gemini 2.5 Pro, Flash-Lite, or Flash models need migration roadmaps to newer alternatives like Gemini 3.1 or GPT-4o-mini.
Next week, watch for potential announcements around OpenAI's funding deployment and whether Google's aggressive deprecation schedule signals broader platform consolidation. The simultaneous timing of these critical migrations suggests coordinated platform evolution across major AI providers.