Arize AX

Enterprise platform for AI observability and evaluation

Arize AX is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

Testing Prompt Engineering LLM Observability Evaluation Agentic CapabilitiesFree

Visit Website GitHub

Description

Arize AX is the commercial AI engineering platform from Arize AI, the company founded in 2020 by Jason Lopatecki and Aparna Dhinakaran that also maintains the open-source Phoenix project. It traces LLM and agent applications, evaluates outputs with online and offline evaluators, and monitors quality and drift in production at enterprise scale. Built-in experiments and a prompt workspace tie iteration during development to what the system does live. The platform connects development-time experiments to live production data, giving teams one view of quality across the model lifecycle. Annotation queues and curated datasets turn real failures into evaluation cases that guide the next iteration.

Key Capabilities:

Tracing for LLM and agent applications through OpenTelemetry
Online and offline LLM-as-judge evaluations
Production monitoring for drift, quality, and performance
Experiments that compare prompts, models, and datasets
Annotation queues and curated datasets for evaluation
Enterprise deployment with role-based access and security controls

Alternative tools

HELM
Reproducible, multi-scenario benchmarking of foundation models
lm-evaluation-harness
Standard framework for benchmarking language models
Storybook
Workshop for building and documenting UI components in isolation
Zencoder
Repository-aware coding and unit-testing agents in your IDE
Goose
Open-source local AI agent for engineering tasks
Keploy
Generate API tests and mocks from real traffic

Used in Stacks

No saved stacks include this tool yet.

Browse more in Testing