Hugging Face π€ Evaluate adds bias metrics for CLMs (GPT-2, BLOOM) with Toxicity, Polarity, and HONEST
AI Impact Summary
Hugging Face introduces bias metrics in the π€ Evaluate library to quantify toxicity, polarity, and gender bias (HONEST) in causal language models. The approach uses prompts from WinoBias and BOLD to generate completions from models like GPT-2 and BLOOM and evaluate them with the R4 Target classifier and Regard metric. This enables automated bias auditing in evaluation pipelines, while also illustrating how small prompt changes can shift outputs, underscoring production risk and the need for mitigation and guardrails.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info