LLM Mistake Correction Experiment — Keras Chatbot Arena
AI Impact Summary
This experiment investigates the ability of LLMs to correct their own mistakes when given explicit feedback, using a simplified calendar management API. The setup involves a prompt instructing the LLM to act as a vocal assistant, and a series of conversational turns designed to elicit errors. Results show that larger models (like Gemma 2 9B) are more reliable, but smaller models and older ones struggle to consistently produce correct API calls, highlighting the need for robust error correction mechanisms in LLM-based assistants.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info