HighCapability

OpenAI AprielGuard: New LLM Safety and Adversarial Robustness Model

Action Required

Organizations deploying agentic LLM systems are now protected against a broader range of security threats, reducing the risk of compromised systems and harmful outputs.

AI Impact Summary

OpenAI is introducing AprielGuard, a new 8B parameter model designed to protect LLM systems from a wide range of safety and adversarial risks, including jailbreaks, prompt injections, and memory manipulation. This capability is crucial as modern LLMs increasingly operate as agentic systems with complex reasoning and tool usage, making them vulnerable to sophisticated attacks. The model’s dual-mode operation (reasoning and fast classification) and training on a diverse synthetic dataset, including long context use cases, demonstrates a proactive approach to securing agentic workflows.

Affected Systems

GPT-4o-mini

Date: 23 Dec 2025
Change type: capability
Severity: high

OpenAI AprielGuard: New LLM Safety and Adversarial Robustness Model

More from Hugging Face

Get alerts for Hugging Face