HighCapability

IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST

Action Required

Organizations relying on agentic systems for IT automation need to understand and mitigate the risk of cascading failures, particularly when using large open models.

AI Impact Summary

IBM and UC Berkeley have identified key failure modes in enterprise agent systems using the ITBench benchmark and the MAST taxonomy. This research reveals that larger, open models like GPT-OSS-120B exhibit cascading failure patterns, while frontier models like Gemini-3-Flash demonstrate more isolated and predictable failures. Understanding these distinct failure modes is crucial for developing more robust and reliable agentic systems, particularly in complex IT automation tasks.

Affected Systems

GPT-3.5 Turbo

Date: 18 Feb 2026
Change type: capability
Severity: high

IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST

More from Hugging Face

Get alerts for Hugging Face