Promptfoo

Test and red team LLM applications from the command line

Testing Prompt Engineering DevOps Evaluation GuardrailsOpen Source

Description

Promptfoo is an open-source CLI and library for evaluating and red teaming LLM applications, originally built by Ian Webster and Michael D'Angelo and acquired by OpenAI in March 2026. It remains MIT-licensed and open source post-acquisition. Where DeepEval targets Python engineers who think in Pytest and Ragas targets RAG-specific metrics, Promptfoo targets DevOps and security teams who prefer declarative YAML configs and CLI-driven workflows. Its defining capability is that it handles both evaluation and adversarial red teaming within a single tool, covering vulnerability classes that dedicated evaluation libraries do not address.

Key Capabilities

CLI and YAML-driven test runner: Tests are defined in version-controlled YAML files and executed via npx promptfoo eval, with no Python environment or test framework required
Red teaming with 50+ attack plugins: Generates adversarial prompts targeting prompt injection, jailbreaks, PII leakage, SQL injection, excessive agency, and hallucination across single-turn and multi-turn sessions
Compliance framework presets: Ships with OWASP LLM Top 10, OWASP Agentic Top 10, NIST AI RMF, and MITRE ATLAS presets that activate entire vulnerability suites from a single config line
Agent-specific security testing: Dedicated plugins for goal hijacking, tool chain attacks, privilege escalation, RBAC bypass, and coding agent vulnerabilities including repository prompt injection and sandbox escape
50+ LLM provider support: Runs evaluations against OpenAI, Anthropic, Mistral, Azure, Groq, Cohere, and locally hosted models in a single comparison run
CI/CD integration via GitHub Action: The promptfoo/promptfoo-action repository provides a dedicated GitHub Action that blocks deploys when evaluations or red team scans fall below defined thresholds

See Promptfoo pricing details →

Alternative tools

Claude Code
Agentic coding tool that runs in your terminal
OpenAI Codex CLI
Terminal coding agent built on OpenAI reasoning models
Aider
AI pair programming in your terminal
Cline
Open-source AI coding agent for any editor
Braintrust Evals
Trace every step your LLM agent takes, from prompt to response
Giskard
Scan AI agents for vulnerabilities before and after deployment

Used in Stacks

No saved stacks include this tool yet.

Browse more in Testing