Evaluation tools

Filters▼

Pricing

Highlights

Featured onlySponsored only

Showing 1-24 of 35 tools

Gentrace
Testing and evaluation for generative AI applications
EvaluationFree
Visit Details Similar
Basalt
Collaborative prompt management and deployment for AI teams
Prompt ManagementPrompt EngineeringDevOps+2Free Tier Available
Visit Details Similar
Promptmetheus
Prompt engineering IDE for composing and testing LLM prompts
Prompt ManagementPrompt EngineeringEvaluationFree Tier Available
Visit Details Similar
LLM Guard
Open-source security toolkit for LLM interactions
LLMEvaluationGuardrailsOpen Source
Visit Details Similar
Llama Guard
Open safeguard model for classifying LLM inputs and outputs
LLMEvaluationGuardrailsOpen Source
Visit Details Similar
HELM
Reproducible, multi-scenario benchmarking of foundation models
EvaluationOpen Source
Visit Details Similar
lm-evaluation-harness
Standard framework for benchmarking language models
EvaluationOpen Source
Visit Details Similar
Langtail
Collaborative prompt playground with testing and deployment
Prompt ManagementTestingPrompt Engineering+5Free
Visit Details Similar
Lunary
Open-source prompt management and observability for LLM apps
Prompt ManagementLLMObservability+2Open Source
Visit Details Similar
garak
Vulnerability scanner for large language models
EvaluationOpen Source
Visit Details Similar
Momentic
AI-powered end-to-end testing written in plain English
TestingDevOpsEvaluation+1Enterprise
Visit Details Similar
Freeplay
Prompt management, evals, and observability for product teams
Prompt ManagementTestingPrompt Engineering+3Enterprise
Visit Details Similar
DSPy
Declarative framework for programming and optimizing LLM pipelines
Prompt ManagementTestingPrompt Engineering+5Open Source
Visit Details Similar
DeepChecks
Validate ML models, LLM applications, and AI agent decisions across every development stage
EvaluationOpen Source
Visit Details Similar
Klu.ai
Collaborative prompt engineering platform with multi-LLM evaluation and fine-tuning.
Prompt ManagementTestingPrompt Engineering+9Free
Visit Details Similar
Humanloop
Prompt management and LLM evaluation platform — acqui-hired by Anthropic; platform ceased September 2025.
Prompt ManagementTestingPrompt Engineering+4Free
Visit Details Similar
Flowise
Visual drag-and-drop builder for LLM workflows, agents, and RAG pipelines — now part of Workday.
Prompt EngineeringRAG FrameworkEvaluation+3Open Source
Visit Details Similar
Evidently AI
Evaluate, test, and monitor traditional ML models and LLM applications from one framework
EvaluationOpen Source
Visit Details Similar
Vectara HHEM
Detect hallucinations in RAG outputs using a dedicated classification model
EvaluationOpen Source
Visit Details Similar
Inspect AI
Evaluate frontier AI models for dangerous capabilities in sandboxed environments
EvaluationOpen Source
Visit Details Similar
OpenAI Evals
Run reproducible benchmarks against OpenAI models and community-contributed eval suites
EvaluationOpen Source
Visit Details Similar
UpTrain
Evaluate RAG pipelines with root cause analysis and a self-hosted dashboard
EvaluationOpen Source
Visit Details Similar
Confident AI
The cloud platform built on DeepEval, the pytest-compatible LLM testing framework
EvaluationFree Tier Available
Visit Details Similar
Patronus AI
Score, benchmark, and stress-test LLM outputs for enterprise deployments
EvaluationFree
Visit Details Similar

Browse AI developer tools by category

The Tools Directory organizes curated AI software tools by workflow layer — from coding assistants and LLM platforms through vector databases, agent frameworks, and observability. Use the filters above or jump into a category below to narrow your shortlist.

Compare AI software tools

Side-by-side evaluation is faster when listings share a consistent format — pricing model, category tags, feature summaries, and links to official docs. Filter by free, freemium, or enterprise tiers, then open individual profiles to compare integrations and fit. For deeper analysis, read expert buying guides and alternative roundups in our resources library.

Discover popular AI platforms

See what teams are adopting today — browse featured tools curated by the DevExplore editorial team, or sort by newest listings to discover emerging AI startups shipping developer products. Sponsored placements are always labeled transparently so you can distinguish editorial picks from paid visibility.

Find tools for your workflow

Not sure where to start? Map a production-ready stack with the AI Stack Builder — it recommends layer-by-layer options for RAG chatbots, AI agents, and LLM applications based on your use case, language, and scale. Bookmark listings as you research, then return when your requirements evolve. Vendors can submit a tool for editorial review if their product is not yet listed.

Categories

Pricing

Highlights

Browse AI developer tools by category

Coding Assistants

Vector Databases

LLM Platforms

Agent Frameworks

Prompt Management

AI Testing & Evaluation

Compare AI software tools

Discover popular AI platforms

Find tools for your workflow