IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST
Action Required
Organizations relying on agentic systems for IT automation need to understand and mitigate the risk of cascading failures, particularly when using large open models.
AI Impact Summary
IBM and UC Berkeley have identified key failure modes in enterprise agent systems using the ITBench benchmark and the MAST taxonomy. This research reveals that larger, open models like GPT-OSS-120B exhibit cascading failure patterns, while frontier models like Gemini-3-Flash demonstrate more isolated and predictable failures. Understanding these distinct failure modes is crucial for developing more robust and reliable agentic systems, particularly in complex IT automation tasks.
Affected Systems
- Date
- 18 Feb 2026
- Change type
- capability
- Severity
- high