LlamaFirewall
Open-source guardrail framework for securing AI agents
LlamaFirewall is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.
Description
LlamaFirewall is an open-source guardrail framework from Meta, released in 2025, that serves as a real-time layer of defense for language model agents. It combines several guardrails into one policy engine: PromptGuard 2 detects jailbreak attempts, Agent Alignment Checks audit an agent's chain of thought for signs of goal hijacking, and CodeShield runs static analysis on generated code to catch insecure patterns. Developers extend it with custom regex and model-based scanners, and Meta runs LlamaFirewall in production, releasing it so teams can compose their own agent defenses.
Key Capabilities:
PromptGuard 2 for real-time detection of jailbreak attempts
Agent Alignment Checks that audit agent reasoning for goal hijacking
CodeShield static analysis that flags insecure generated code
A unified policy engine that composes multiple scanners into a pipeline
Custom regex and model-based scanners for application-specific risks
Layered defenses covering user input, agent reasoning, and code output
Alternative tools
- HiddenLayer
Security platform for protecting machine learning models
- Datafold
Data diffing and regression testing for data teams
- Gentrace
Testing and evaluation for generative AI applications
- HoneyHive
Evaluation and observability platform for AI agents
- Sentry
Error tracking and performance monitoring for developers
- QA Wolf
Managed end-to-end test creation and maintenance service
