LlamaFirewall

Open-source guardrail framework for securing AI agents

LlamaFirewall is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

Testing LLM Backend Observability GuardrailsOpen Source

Visit Website GitHub

Description

LlamaFirewall is an open-source guardrail framework from Meta, released in 2025, that serves as a real-time layer of defense for language model agents. It combines several guardrails into one policy engine: PromptGuard 2 detects jailbreak attempts, Agent Alignment Checks audit an agent's chain of thought for signs of goal hijacking, and CodeShield runs static analysis on generated code to catch insecure patterns. Developers extend it with custom regex and model-based scanners, and Meta runs LlamaFirewall in production, releasing it so teams can compose their own agent defenses.

Key Capabilities:

PromptGuard 2 for real-time detection of jailbreak attempts
Agent Alignment Checks that audit agent reasoning for goal hijacking
CodeShield static analysis that flags insecure generated code
A unified policy engine that composes multiple scanners into a pipeline
Custom regex and model-based scanners for application-specific risks
Layered defenses covering user input, agent reasoning, and code output

Alternative tools

HiddenLayer
Security platform for protecting machine learning models
Datafold
Data diffing and regression testing for data teams
Gentrace
Testing and evaluation for generative AI applications
HoneyHive
Evaluation and observability platform for AI agents
Sentry
Error tracking and performance monitoring for developers
QA Wolf
Managed end-to-end test creation and maintenance service

Used in Stacks

No saved stacks include this tool yet.

Browse more in Testing