DevExplore wordmark watermark
DevExplore
  • Categories
  • Tools Directory
  • AI Stack Builder
  • Resources
  • Jobs
  • Advertise
AboutContactSign in
Home/Tools Directory/Arize Ax
DevExplore

The discovery platform for developers

Platform

  • Categories
  • Tools Directory
  • AI Stack Builder
  • Resources
  • Jobs
  • Advertise

Community

  • Create account
  • Sign in
  • Submit a tool
  • Browse jobs

Company

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Service
  • Cookie Policy

Get Updates

Occasional product updates and curated picks. No spam.

    © 2026 DevExplore. All rights reserved.

    About UsContact UsPrivacy PolicyTerms of ServiceCookie Policy
    1. Home
    2. /
    3. Tools Directory
    4. /
    5. Arize AX
    A

    Added 6/24/2026

    Arize AX

    Enterprise platform for AI observability and evaluation

    Arize AX is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

    TestingPrompt EngineeringLLMObservabilityEvaluationAgentic CapabilitiesFree
    Visit WebsiteGitHub

    Description

     Arize AX is the commercial AI engineering platform from Arize AI, the company founded in 2020 by Jason Lopatecki and Aparna Dhinakaran that also maintains the open-source Phoenix project. It traces LLM and agent applications, evaluates outputs with online and offline evaluators, and monitors quality and drift in production at enterprise scale. Built-in experiments and a prompt workspace tie iteration during development to what the system does live. The platform connects development-time experiments to live production data, giving teams one view of quality across the model lifecycle. Annotation queues and curated datasets turn real failures into evaluation cases that guide the next iteration.

    Key Capabilities:

    • Tracing for LLM and agent applications through OpenTelemetry

    • Online and offline LLM-as-judge evaluations

    • Production monitoring for drift, quality, and performance

    • Experiments that compare prompts, models, and datasets

    • Annotation queues and curated datasets for evaluation

    • Enterprise deployment with role-based access and security controls

    Alternative tools

    • HELM

      Reproducible, multi-scenario benchmarking of foundation models

    • lm-evaluation-harness

      Standard framework for benchmarking language models

    • Storybook

      Workshop for building and documenting UI components in isolation

    • Zencoder

      Repository-aware coding and unit-testing agents in your IDE

    • Goose

      Open-source local AI agent for engineering tasks

    • Keploy

      Generate API tests and mocks from real traffic

    Used in Stacks

    No saved stacks include this tool yet.

    Browse more in Testing