Promptfoo
Test and red team LLM applications from the command line
Description
Promptfoo is an open-source CLI and library for evaluating and red teaming LLM applications, originally built by Ian Webster and Michael D'Angelo and acquired by OpenAI in March 2026. It remains MIT-licensed and open source post-acquisition. Where DeepEval targets Python engineers who think in Pytest and Ragas targets RAG-specific metrics, Promptfoo targets DevOps and security teams who prefer declarative YAML configs and CLI-driven workflows. Its defining capability is that it handles both evaluation and adversarial red teaming within a single tool, covering vulnerability classes that dedicated evaluation libraries do not address.
Key Capabilities
CLI and YAML-driven test runner: Tests are defined in version-controlled YAML files and executed via npx promptfoo eval, with no Python environment or test framework required
Red teaming with 50+ attack plugins: Generates adversarial prompts targeting prompt injection, jailbreaks, PII leakage, SQL injection, excessive agency, and hallucination across single-turn and multi-turn sessions
Compliance framework presets: Ships with OWASP LLM Top 10, OWASP Agentic Top 10, NIST AI RMF, and MITRE ATLAS presets that activate entire vulnerability suites from a single config line
Agent-specific security testing: Dedicated plugins for goal hijacking, tool chain attacks, privilege escalation, RBAC bypass, and coding agent vulnerabilities including repository prompt injection and sandbox escape
50+ LLM provider support: Runs evaluations against OpenAI, Anthropic, Mistral, Azure, Groq, Cohere, and locally hosted models in a single comparison run
CI/CD integration via GitHub Action: The promptfoo/promptfoo-action repository provides a dedicated GitHub Action that blocks deploys when evaluations or red team scans fall below defined thresholds
Alternative tools
- Claude Code
Agentic coding tool that runs in your terminal
- OpenAI Codex CLI
Terminal coding agent built on OpenAI reasoning models
- Aider
AI pair programming in your terminal
- Cline
Open-source AI coding agent for any editor
- Braintrust Evals
Trace every step your LLM agent takes, from prompt to response
- Giskard
Scan AI agents for vulnerabilities before and after deployment
