DeepChecks
Validate ML models, LLM applications, and AI agent decisions across every development stage
DeepChecks is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.
Description
Deepchecks is an open-source ML and LLM testing platform founded in 2019 by Philip Tannor and Shir Chorev in Tel Aviv, both graduates of the IDF's Talpiot program and Unit 8200 intelligence unit, who had been working together since they were 18. The pair published an arXiv paper on ML testing methodology in March 2022 before raising a $14M seed round in June 2023, reflecting a research-first approach that distinguishes Deepchecks from most commercially-led testing tools. Check Point Software acquired Deepchecks in May 2026, integrating it into Check Point's Agentic Network Security Orchestration platform. The open-source library remains accessible under its original license, though the commercial platform's roadmap is now directed by Check Point's enterprise security priorities.
Key Capabilities
ML model validation across the development lifecycle: Systematic data validation, feature drift detection, model performance testing, and segmentation error analysis run at training, staging, and production stages, drawing directly on software CI/CD testing principles applied to ML
LLM evaluation with version comparison: Auto-scoring, business metric tracking, and side-by-side version comparison for LLM applications and RAG pipelines, covering answer quality, instruction following, and output faithfulness
Granular agent sub-task evaluation: Breaks complex agent executions into individual sub-tasks and scores each one using LLM judges, assessing tool selection, error recovery, and decision quality at both step and session level
Root cause analysis: Identifies the specific code-level origin of model failures rather than returning aggregate scores, reducing initial diagnosis time by up to 70% according to Deepchecks' own benchmarks
Flexible deployment including air-gapped environments: Runs as SaaS, virtual private cloud on GCP or Azure, bare-metal, or air-gapped, with native AWS integrations covering SageMaker Partner AI Apps, Bedrock, and GovCloud for regulated industries
Enterprise compliance and workflow integrations: SOC 2 Type 2, GDPR, and HIPAA compliance alongside Slack and PagerDuty integrations for routing validation alerts into existing operations workflows
Alternative tools
- CodeGeeX
Free open-source AI coding assistant from Tsinghua University
- Sourcery
Automated AI code reviewer for GitHub and GitLab pull requests
- Factory.ai
Autonomous software engineering agents for enterprise development teams
- Pythagora
Full-stack AI app builder with 14 specialized agents
- Refact.ai
Local-first AI coding agent with enterprise fine-tuning support
- Blackbox AI
Multi-model AI coding assistant with Chairman LLM orchestration
