Soda
Data quality testing defined in a readable check language
Soda is profiled here as a Testing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.
Description
Soda is a data quality platform built around Soda Core, an open-source CLI, and the SodaCL check language. Engineers write checks for freshness, completeness, schema, and custom SQL conditions in YAML, then run them against warehouses and pipelines to catch bad data before it spreads downstream. Soda Cloud adds monitoring, anomaly detection, and collaboration on top of the open-source scanner. Writing checks in a readable language lets data and engineering teams collaborate on quality rules without deep tooling knowledge. The scanner plugs into orchestrators and transformation tools, so checks run as part of existing pipelines.
Key Capabilities:
SodaCL language for declaring data quality checks
Validation across Snowflake, BigQuery, Databricks, Postgres, and Spark
Checks for freshness, completeness, validity, and schema
Anomaly detection on metrics over time in Soda Cloud
Pipeline integration with Airflow, dbt, and CI tools
Apache 2.0 Soda Core with a managed cloud option
Alternative tools
- Elementary
dbt-native data observability and anomaly detection
- Arize AX
Enterprise platform for AI observability and evaluation
- HELM
Reproducible, multi-scenario benchmarking of foundation models
- lm-evaluation-harness
Standard framework for benchmarking language models
- Storybook
Workshop for building and documenting UI components in isolation
- Zencoder
Repository-aware coding and unit-testing agents in your IDE
