Apache Druid

Real-time analytics database for sub-second queries

Apache Druid is profiled here as a Backend tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.

Backend Storage Observability Data Ingestion Data WarehouseOpen Source

Visit Website GitHub

Description

Apache Druid is an open-source real-time analytics database, created at Metamarkets by Eric Tschetter and Fangjin Yang, designed for fast aggregation queries over large event streams. It ingests data from streaming sources like Kafka and from batch files, then serves slice-and-dice queries with sub-second latency across high-cardinality data. Druid powers user-facing analytics and operational dashboards where many users run interactive queries at once, and its distributed design scales ingestion and querying independently.

Key Capabilities:

Sub-second aggregation queries over large, high-cardinality datasets
Streaming ingestion from Kafka and Kinesis with exactly-once handling
Batch ingestion from files and object storage
A columnar format with bitmap indexes tuned for analytics
A distributed architecture that scales ingestion and querying separately
Native support for time-series and event data

Alternative tools

Anomalo
Automated data quality monitoring with machine learning
RudderStack
Warehouse-native customer data pipeline and Segment alternative
Storj
Distributed S3-compatible storage across a global network
Wasabi
S3-compatible hot cloud storage without egress fees
Better Auth
Framework-agnostic authentication library for TypeScript
Ory
Open-source identity, authentication, and access control

Used in Stacks

No saved stacks include this tool yet.

Browse more in Backend