InfoCapability

Lighthouz and Hugging Face launch Chatbot Guardrails Arena to benchmark privacy across 12 LLMs

AI Impact Summary

The Chatbot Guardrails Arena is a sanctioned stress test where two anonymous LLMs with guardrails are probed with adversarial prompts to reveal sensitive data from a fictional bank. The setup benchmarks the privacy-preserving capabilities of multiple model families (gpt3.5-turbo-l106, Gemini-Pro, Llama-2-70b-chat-hf, Mixtral-8x7B-Instruct-v0.1) and guardrail implementations (NVIDIA NeMo Guardrails, Meta LlamaGuard). A public leaderboard and Elo ranking will surface practical privacy strengths and gaps, guiding enterprise decisions on model-guardrail stacks. The exercise highlights concrete leakage risks (names, SSNs, account numbers, balances) and informs future privacy-preserving AI development.

Affected Systems

gpt3.5-turbo-l106Gemini-Pro

Date: Date not specified
Change type: capability
Severity: info

Lighthouz and Hugging Face launch Chatbot Guardrails Arena to benchmark privacy across 12 LLMs

More from Hugging Face

Get alerts for Hugging Face