Lighthouz and Hugging Face launch Chatbot Guardrails Arena to benchmark privacy across 12 LLMs
AI Impact Summary
The Chatbot Guardrails Arena is a sanctioned stress test where two anonymous LLMs with guardrails are probed with adversarial prompts to reveal sensitive data from a fictional bank. The setup benchmarks the privacy-preserving capabilities of multiple model families (gpt3.5-turbo-l106, Gemini-Pro, Llama-2-70b-chat-hf, Mixtral-8x7B-Instruct-v0.1) and guardrail implementations (NVIDIA NeMo Guardrails, Meta LlamaGuard). A public leaderboard and Elo ranking will surface practical privacy strengths and gaps, guiding enterprise decisions on model-guardrail stacks. The exercise highlights concrete leakage risks (names, SSNs, account numbers, balances) and informs future privacy-preserving AI development.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info