Structured CodeAgents: JSON thoughts and code for reliable tool calls via OpenAI function calling API
AI Impact Summary
Forcing CodeAgents to emit a structured JSON blob with explicit thoughts and executable Python code, integrated with OpenAI function calling API, yields more reliable tool orchestration than unstructured code or pure function calls. JSON structure eliminates markdown parsing issues, enforces planning before action, and maintains state across tool calls for composability. Benchmark results across SmolBench suites GAIA, MATH, SimpleQA, and Frames show 2-7 percentage point gains for capable models, with OpenAI, Claude 3.7 Sonnet, and Qwen/Mistral variants benefiting the most; smaller models exhibit a structure tax due to JSON and Python syntax overhead and higher parsing risk. To realize the gains, teams should ensure their pipelines can parse the JSON, execute Python in a controlled environment, and use models with strong instruction-following (32B+ or frontier models).
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info