InfoCapability

IBM and UC Berkeley Diagnose Enterprise Agent Failures with MAST

AI Impact Summary

IBM and UC Berkeley have developed a framework, MAST, to diagnose failures in enterprise agent systems using ITBench, a benchmark for SRE and automation tasks. The analysis reveals that larger, open models like GPT-OSS-120B exhibit cascading failure patterns, while smaller models like Gemini-3-Flash show isolated failure modes, offering a structured approach to understanding and addressing agentic system weaknesses. This allows for targeted improvements based on specific failure signatures, moving beyond simple success/failure metrics.

Affected Systems

ITBenchGPT-OSS-120B

Date: Date not specified
Change type: capability
Severity: info

IBM and UC Berkeley Diagnose Enterprise Agent Failures with MAST

More from Hugging Face

Get alerts for Hugging Face