Arize AX
Enterprise platform for AI observability and evaluation
Arize AX is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.
Description
Arize AX is the commercial AI engineering platform from Arize AI, the company founded in 2020 by Jason Lopatecki and Aparna Dhinakaran that also maintains the open-source Phoenix project. It traces LLM and agent applications, evaluates outputs with online and offline evaluators, and monitors quality and drift in production at enterprise scale. Built-in experiments and a prompt workspace tie iteration during development to what the system does live. The platform connects development-time experiments to live production data, giving teams one view of quality across the model lifecycle. Annotation queues and curated datasets turn real failures into evaluation cases that guide the next iteration.
Key Capabilities:
Tracing for LLM and agent applications through OpenTelemetry
Online and offline LLM-as-judge evaluations
Production monitoring for drift, quality, and performance
Experiments that compare prompts, models, and datasets
Annotation queues and curated datasets for evaluation
Enterprise deployment with role-based access and security controls
Alternative tools
- HELM
Reproducible, multi-scenario benchmarking of foundation models
- lm-evaluation-harness
Standard framework for benchmarking language models
- Storybook
Workshop for building and documenting UI components in isolation
- Zencoder
Repository-aware coding and unit-testing agents in your IDE
- Goose
Open-source local AI agent for engineering tasks
- Keploy
Generate API tests and mocks from real traffic
