Open-source DeepResearch reproduction uses code-based agents to reach 55.15% GAIA score
AI Impact Summary
Open-source reproduction of DeepResearch demonstrates that an end-to-end agent stack combining an LLM, a code-based action layer, and lightweight browser/tooling can achieve 55.15% GAIA validation on the benchmark, outperforming JSON-based action baselines. The effort leverages smolagents, Magentic-One tooling, and OpenAI Operator-like capabilities, illustrating a practical path to open, competitive agent frameworks. For business teams, this signals rising parity between open-source agent stacks and proprietary solutions, with potential cost and control benefits as internal implementations can reach comparable performance without vendor lock-in.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info