Braintrust Evals
Trace every step your LLM agent takes, from prompt to response
Description
Arize Phoenix is an open-source AI observability and evaluation platform built by Arize AI, a machine learning observability vendor founded in 2020. Phoenix launched in May 2023 at Arize's Observe summit, bringing the spans-and-traces model familiar from traditional APM tools into LLM application development. Unlike most OSS observability tools that reserve advanced features for paid tiers, Phoenix ships are fully featured with no feature gates. The commercial counterpart, Arize AX, serves enterprise teams that need RBAC, SSO, audit trails, and higher trace volumes, but the open-source library itself is not artificially limited.
Key Capabilities
OpenTelemetry-native tracing: Phoenix instruments LLM calls, retrieval steps, tool executions, and agent reasoning through OpenInference, an open OTel-based telemetry standard that Arize maintains, keeping trace data portable across observability platforms
Broad framework and provider support: Auto-instrumentation covers the OpenAI Agents SDK, Claude Agent SDK, LangGraph, LlamaIndex, CrewAI, DSPy, Vercel AI SDK, and Mastra, alongside providers including Anthropic, AWS Bedrock, Google GenAI, and LiteLLM
LLM evals library: Pre-built evaluation templates for hallucination, summarization, and retrieval relevance run against any traced span, with support for custom LLM-as-judge templates and human annotation queues
Datasets and experiments: Traces group into datasets that run through different application versions side-by-side, producing comparison results that confirm whether a prompt or architecture change produced a measurable improvement
RAG embedding analysis: Clusters query and knowledge base embeddings to surface missing context, irrelevant retrievals, and semantically similar failure cases without manual log inspection
Span replay and prompt playground: Any production span replays with modified inputs for targeted debugging, and the prompt playground runs side-by-side model comparisons without leaving the Phoenix interface
Alternative tools
- Claude Code
Agentic coding tool that runs in your terminal
- OpenAI Codex CLI
Terminal coding agent built on OpenAI reasoning models
- Aider
AI pair programming in your terminal
- Cline
Open-source AI coding agent for any editor
- Giskard
Scan AI agents for vulnerabilities before and after deployment
- Promptfoo
Test and red team LLM applications from the command line
