xAI Outages and Google's Vertex AI Overhaul: Week of 19 January 2026
xAI Outages and Google's Vertex AI Overhaul: Week of 19 January 2026
This week delivered a stark reminder that even the biggest AI providers aren't immune to spectacular failures. xAI's authentication system collapsed, leaving users locked out of all services, whilst Google simultaneously pulled the plug on multiple Vertex AI models with barely any notice. When critical infrastructure fails and major deprecations hit simultaneously, it's a proper mess.
The Big Moves
xAI's Authentication Meltdown
xAI experienced a complete authentication system failure this week, with both their single sign-on service and Grok web interface becoming unavailable. The SSO outage is particularly damaging because it blocks access to all xAI services, not just individual components. Users reported being unable to authenticate through any method, effectively locking them out of their accounts and halting all work.
What makes this worse is the timing coincides with xAI's deprecation of their Chat Completions API. Developers already dealing with forced migration to new API methods now can't even access the platform to test their changes. The fact that API and mobile services remained operational suggests this was an infrastructure issue rather than a broader platform failure, but that's cold comfort for users relying on web access.
The lack of clear communication about resolution timelines has left users in limbo. For organisations evaluating xAI as a primary provider, this incident raises serious questions about reliability and incident response capabilities.
Google's Vertex AI Purge
Google dropped a bombshell on Vertex AI users with the simultaneous deprecation of multiple major models and endpoints, effective 23 January 2026. The casualties include Imagen generation endpoints, Veo 3.1 Lite, and surprisingly, Anthropic's Claude 3 Sonnet from the Model Garden. This isn't gradual sunsetting, it's a coordinated clearout.
The Imagen and Veo deprecations are particularly brutal because they give users essentially no migration time. Applications using these models for image and video generation will simply stop working after 23 January unless developers have already migrated to alternative endpoints. Google is pushing users towards newer models, but the compressed timeline makes this feel more like a forced march than a planned transition.
The Claude 3 Sonnet removal from Model Garden signals broader shifts in Google's partnership strategy. Users who've built workflows around Claude access through Vertex AI now need to either migrate to direct Anthropic access or switch to Google's own models entirely. This fragmentation of model access points is becoming a real headache for developers trying to maintain consistent AI capabilities.
AWS Expands Whilst Others Stumble
Whilst xAI and Google dealt with outages and deprecations, AWS quietly expanded capabilities across multiple services. The introduction of .ai domain support in Route 53 might seem minor, but it's perfectly timed as AI companies scramble for memorable domains. More significantly, the expansion of Amazon Neptune Analytics to seven new regions and enhanced SageMaker HyperPod debugging capabilities shows AWS continuing to build out infrastructure whilst competitors deal with stability issues.
The Reserved Tier availability for Claude Sonnet 4.5 in AWS GovCloud is particularly noteworthy. Government AI adoption has been cautious, but having dedicated pricing tiers suggests AWS sees significant demand from regulated sectors. This could accelerate enterprise AI adoption in environments where cost predictability matters more than cutting-edge features.
Worth Watching
Replicate's Multi-Model Crisis Replicate suffered a critical incident affecting both inference and training systems across multiple models. This type of broad platform failure is exactly what developers fear when relying on third-party AI infrastructure. The lack of specific resolution timelines suggests this was a complex systems issue rather than a simple service restart.
Together AI's Performance Optimisation Guide Together AI published detailed guidance on optimising inference speed and costs, focusing on practical techniques like quantization and regional proxies. This type of operational knowledge sharing is valuable, especially as organisations realise that model performance isn't just about accuracy but also about deployment efficiency.
Anthropic's Constitutional Update Anthropic released a new constitution for Claude, representing a significant shift in their approach to AI alignment and behaviour. This isn't just a policy update, it's a fundamental change to how Claude is trained and responds. Users should expect subtle but meaningful changes in model behaviour.
Qdrant's Docker Snapshot Issues Qdrant v1.16.3 is experiencing snapshot restoration problems in Docker environments, affecting backup and disaster recovery workflows. For teams using Qdrant for production vector storage, this is a critical issue that could impact data recovery capabilities.
Quick Hits
- Google Colab Enterprise expanded with BigFrames, BigQuery ML, and Managed Spark integration
- AWS Bedrock AgentCore Browser now supports custom extensions for enhanced agent functionality
- OpenAI Differential Transformer V2 released with improved inference efficiency and training stability
- Hugging Face analysis highlights China's open-source AI ecosystem explosion following DeepSeek's success
- AWS EC2 U7i instances now available in Singapore region for high-memory workloads
The Week Ahead
The 23 January deadline for Google's Vertex AI model deprecations is imminent. If you're using Imagen generation, Veo models, or Claude 3 Sonnet through Vertex AI, this is your last chance to migrate before services stop working. Don't assume Google will extend these deadlines.
Watch for xAI's incident post-mortem and any changes to their API deprecation timeline given this week's authentication issues. The combination of service instability and forced API migrations is testing user patience.
AWS continues expanding regional availability across multiple services. If you're in Singapore or other newly supported regions, expect more capability announcements as AWS builds out its global infrastructure advantage whilst competitors deal with platform stability issues.
The broader pattern this week shows the AI infrastructure landscape remains volatile. Betting everything on a single provider looks increasingly risky when authentication systems can fail and models can disappear with minimal notice.