InfoCapability

Hugging Face 🤗 Evaluate adds bias metrics for CLMs (GPT-2, BLOOM) with Toxicity, Polarity, and HONEST

AI Impact Summary

Hugging Face introduces bias metrics in the 🤗 Evaluate library to quantify toxicity, polarity, and gender bias (HONEST) in causal language models. The approach uses prompts from WinoBias and BOLD to generate completions from models like GPT-2 and BLOOM and evaluate them with the R4 Target classifier and Regard metric. This enables automated bias auditing in evaluation pipelines, while also illustrating how small prompt changes can shift outputs, underscoring production risk and the need for mitigation and guardrails.

Affected Systems

🤗 EvaluateGPT-2

Date: Date not specified
Change type: capability
Severity: info

Hugging Face 🤗 Evaluate adds bias metrics for CLMs (GPT-2, BLOOM) with Toxicity, Polarity, and HONEST

More from Hugging Face

Get alerts for Hugging Face