DevExplore wordmark watermark
DevExplore
  • Categories
  • Tools Directory
  • AI Stack Builder
  • Resources
  • Jobs
  • Advertise
AboutContactSign in
Home/Tools Directory/Deepchecks
DevExplore

The discovery platform for developers

Platform

  • Categories
  • Tools Directory
  • AI Stack Builder
  • Resources
  • Jobs
  • Advertise

Community

  • Create account
  • Sign in
  • Submit a tool
  • Browse jobs

Company

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Service
  • Cookie Policy

Get Updates

Occasional product updates and curated picks. No spam.

    © 2026 DevExplore. All rights reserved.

    About UsContact UsPrivacy PolicyTerms of ServiceCookie Policy
    1. Home
    2. /
    3. Tools Directory
    4. /
    5. DeepChecks
    D

    Added 6/11/2026

    DeepChecks

    Validate ML models, LLM applications, and AI agent decisions across every development stage

    DeepChecks is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

    TestingLLMObservabilityEvaluationAgentic CapabilitiesData QualityOpen Source
    Visit WebsiteGitHub

    Description

    Deepchecks is an open-source ML and LLM testing platform founded in 2019 by Philip Tannor and Shir Chorev in Tel Aviv, both graduates of the IDF's Talpiot program and Unit 8200 intelligence unit, who had been working together since they were 18. The pair published an arXiv paper on ML testing methodology in March 2022 before raising a $14M seed round in June 2023, reflecting a research-first approach that distinguishes Deepchecks from most commercially-led testing tools. Check Point Software acquired Deepchecks in May 2026, integrating it into Check Point's Agentic Network Security Orchestration platform. The open-source library remains accessible under its original license, though the commercial platform's roadmap is now directed by Check Point's enterprise security priorities.

    Key Capabilities

    • ML model validation across the development lifecycle: Systematic data validation, feature drift detection, model performance testing, and segmentation error analysis run at training, staging, and production stages, drawing directly on software CI/CD testing principles applied to ML

    • LLM evaluation with version comparison: Auto-scoring, business metric tracking, and side-by-side version comparison for LLM applications and RAG pipelines, covering answer quality, instruction following, and output faithfulness

    • Granular agent sub-task evaluation: Breaks complex agent executions into individual sub-tasks and scores each one using LLM judges, assessing tool selection, error recovery, and decision quality at both step and session level

    • Root cause analysis: Identifies the specific code-level origin of model failures rather than returning aggregate scores, reducing initial diagnosis time by up to 70% according to Deepchecks' own benchmarks

    • Flexible deployment including air-gapped environments: Runs as SaaS, virtual private cloud on GCP or Azure, bare-metal, or air-gapped, with native AWS integrations covering SageMaker Partner AI Apps, Bedrock, and GovCloud for regulated industries

    • Enterprise compliance and workflow integrations: SOC 2 Type 2, GDPR, and HIPAA compliance alongside Slack and PagerDuty integrations for routing validation alerts into existing operations workflows

    See DeepChecks pricing details →

    Alternative tools

    • CodeGeeX

      Free open-source AI coding assistant from Tsinghua University

    • Sourcery

      Automated AI code reviewer for GitHub and GitLab pull requests

    • Factory.ai

      Autonomous software engineering agents for enterprise development teams

    • Pythagora

      Full-stack AI app builder with 14 specialized agents

    • Refact.ai

      Local-first AI coding agent with enterprise fine-tuning support

    • Blackbox AI

      Multi-model AI coding assistant with Chairman LLM orchestration

    Used in Stacks

    No saved stacks include this tool yet.

    Browse more in Testing