System Design

Designing a Real-Time Analytics Platform at Scale: ClickHouse, Apache Druid, Lambda Architecture & Streaming Analytics

Building an analytics platform that ingests 10 billion events per day while delivering sub-second query responses requires careful architectural decisions. This comprehensive guide walks through Lambda vs Kappa architecture trade-offs, ClickHouse vs Apache Druid vs Pinot comparisons, Apache Flink stream processing, and the materialized view strategies that power dashboards at scale.

Md Sanwar Hossain April 11, 2026 24 min read System Design
Real-time analytics platform system design at scale with ClickHouse, Apache Druid, and Lambda architecture

TL;DR — Architecture Decision in One Paragraph

"Use Lambda architecture for combined batch + streaming analytics. ClickHouse for sub-second OLAP queries. Apache Druid for sub-second slice-and-dice. Apache Flink for stateful stream processing. Kafka as the unified event backbone. Pre-aggregate with materialized views for dashboard queries."

Table of Contents

  1. Requirements & Scale Estimation
  2. Lambda vs Kappa Architecture Trade-offs
  3. Ingestion Layer: Kafka, Kinesis & Pulsar
  4. Stream Processing: Apache Flink at Scale
  5. Batch Layer: Spark & Data Lake Architecture
  6. OLAP Storage: ClickHouse vs Apache Druid vs Pinot
  7. Query Layer & API Design
  8. Materialized Views & Pre-Aggregation Strategies
  9. Reliability, Data Quality & SLOs
  10. Real-World Case Studies

1. Requirements & Scale Estimation

Before designing the system, we must define the scale, latency targets, and data retention requirements. Getting these wrong leads to over-engineered or under-provisioned infrastructure that is expensive to fix in production.

Functional Requirements

Non-Functional Requirements

Scale Math & Capacity Estimation

Let's work through the numbers systematically before choosing technology:

Component Count Spec Purpose
Kafka Brokers 9 32 vCPU, 128 GB RAM, 10 TB SSD Event backbone, 7-day retention
Flink TaskManagers 20 16 vCPU, 64 GB RAM Stream processing, windowed aggregations
ClickHouse Nodes 12 32 vCPU, 256 GB RAM, 20 TB NVMe OLAP hot tier, sub-second queries
Query Servers 6 8 vCPU, 32 GB RAM REST/GraphQL API, caching, routing
Real-time analytics platform architecture: Lambda and Kappa patterns with ClickHouse and Apache Druid
Real-Time Analytics Platform Architecture — Lambda and Kappa patterns with ClickHouse and Apache Druid. Source: mdsanwarhossain.me

2. Lambda vs Kappa Architecture Trade-offs

The architectural pattern you choose determines system complexity, operational burden, and your ability to correct historical data. Both Lambda and Kappa have legitimate use cases — the decision depends on your team size, accuracy requirements, and reprocessing frequency.

Lambda Architecture

Lambda architecture processes data through two parallel paths that merge at the serving layer:

Kappa Architecture

Kappa architecture simplifies by using a single streaming path for both real-time processing and historical reprocessing:

Factor Lambda Kappa
Complexity High (two code paths) Lower (one code path)
Reprocessing Accurate, efficient Spark batch Replay from Kafka / data lake
Team size fit Large teams with dedicated streaming + batch eng Smaller teams, faster iteration
Latency profile Dual: real-time + batch correction Stream-first, eventual consistency
Use when Accuracy is critical, frequent backfills Speed-to-market, simpler ops

3. Ingestion Layer: Kafka, Kinesis & Pulsar

The ingestion layer is the system's foundation. Every downstream component depends on it being reliable, ordered (within partitions), and fast. At 115K+ events/second, small design mistakes become catastrophic at scale.

Kafka Partitioning Strategy

Partition key choice affects ordering guarantees and downstream join performance:

Event Schema (Avro with Schema Registry)

{
  "event_id": "uuid-v4",
  "tenant_id": "acme-corp",
  "user_id": "user-12345",
  "event_type": "page_view",
  "timestamp": "2026-04-11T10:30:00Z",
  "properties": {
    "page": "/dashboard",
    "duration_ms": 1250,
    "referrer": "google.com"
  },
  "session_id": "sess-xyz",
  "ip_address": "192.168.1.1"
}

Kafka vs Kinesis vs Pulsar

Feature Apache Kafka Amazon Kinesis Apache Pulsar
Max throughput GB/sec per broker 1 MB/sec per shard GB/sec per broker
Retention Configurable (days–infinite) 7 days max Tiered (unlimited)
Ops burden High (self-managed) Low (managed) High (self-managed)
Multi-tenancy Namespace isolation Stream-level Native multi-tenancy
Best for High-throughput, Kappa arch AWS-native, serverless Multi-region, geo-replication

Dead Letter Queue & Idempotency

Apache Flink is the industry standard for stateful stream processing at scale. Its exactly-once semantics, rich windowing API, and SQL support make it the right choice for building the speed layer of a real-time analytics platform.

Window Types & When to Use Each

Flink SQL for Real-Time Aggregations

SELECT 
    tenant_id,
    event_type,
    TUMBLE_START(rowtime, INTERVAL '1' MINUTE) as window_start,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
FROM events
GROUP BY 
    tenant_id, 
    event_type,
    TUMBLE(rowtime, INTERVAL '1' MINUTE)
Apache Flink stream processing pipeline: windowing, stateful operators and exactly-once semantics
Apache Flink Stream Processing Pipeline — windowing, stateful operators, and exactly-once semantics. Source: mdsanwarhossain.me

Exactly-Once Semantics & State Backends

5. Batch Layer: Spark & Data Lake Architecture

In Lambda architecture, the batch layer provides the ground truth by reprocessing all raw data. Apache Spark running over a well-structured data lake handles this efficiently, correcting any errors from the speed layer while enabling long-horizon historical analysis.

Data Lake Zone Architecture

Table Format: Delta Lake vs Apache Iceberg

Feature Delta Lake Apache Iceberg
Schema evolution Add/rename columns Full schema evolution + type promotions
Time travel Yes (transaction log) Yes (snapshot-based)
Engine support Spark-centric, Trino/Flink improving Spark, Trino, Flink, Hive native
Best for Databricks-centric stacks Multi-engine open data lakes

Spark Batch Job Patterns

6. OLAP Storage: ClickHouse vs Apache Druid vs Pinot

The OLAP storage layer is where query performance lives. Your choice of ClickHouse, Apache Druid, or Apache Pinot determines the types of queries you can serve sub-second, the operational complexity you accept, and the cost structure at scale.

Feature ClickHouse Apache Druid Apache Pinot
Query latency Sub-second Sub-second Sub-second
Ingestion mode Batch + streaming Native streaming Native streaming
SQL support Full ANSI SQL Limited SQL Limited SQL
Joins Full joins supported Limited joins Limited joins
Data freshness Seconds Seconds Seconds
Cardinality High cardinality OK HLL approximations HLL approximations
Compression 10:1 typical 8:1 typical 8:1 typical
Horizontal scaling Sharding + replication Tier-based (historical/real-time) Tier-based
Best for Ad-hoc SQL analytics Time-series, dashboards Pinpoint low-latency queries
Used by Cloudflare, Yandex Uber, Netflix LinkedIn, Uber

ClickHouse Table DDL: MergeTree Engine

The ORDER BY clause is the most critical design decision in ClickHouse — it becomes the primary index and determines which query patterns will be sub-second:

CREATE TABLE events (
    event_id UUID,
    tenant_id String,
    event_type LowCardinality(String),
    user_id String,
    timestamp DateTime,
    properties String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, event_type, timestamp)
SETTINGS index_granularity = 8192;

Key design choices: LowCardinality(String) for event_type (reduces memory by 3–5×), monthly partitioning for efficient data retention management, and ORDER BY starting with tenant_id to isolate scans per tenant.

7. Query Layer & API Design

The query layer abstracts the storage tier from consumers, enabling intelligent routing between materialized views (fast) and raw tables (flexible), caching to reduce OLAP load, and multi-tenancy enforcement at the API boundary.

REST API Design

Caching & Query Routing

GraphQL for Flexible Ad-Hoc Queries

GraphQL enables frontend teams to request exactly the metrics and dimensions they need without requiring backend changes for each new dashboard widget. Use persisted queries in production to prevent expensive arbitrary queries that could scan terabytes of data. Pair with query complexity limits (max depth = 5, max nodes = 50) to protect the OLAP tier.

8. Materialized Views & Pre-Aggregation Strategies

Pre-aggregation is the single highest-leverage optimization for analytics platform query performance. At 10B events/day, scanning raw events for every dashboard query is infeasible. Materialized views trade storage for query latency, and with 10:1 compression, the storage cost is manageable.

ClickHouse Materialized View: Hourly Rollup

CREATE MATERIALIZED VIEW events_hourly_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (tenant_id, event_type, hour)
AS SELECT
    tenant_id,
    event_type,
    toStartOfHour(timestamp) as hour,
    count() as event_count,
    uniq(user_id) as unique_users,
    avg(response_time_ms) as avg_response_time
FROM events
GROUP BY tenant_id, event_type, hour;

Tiered Aggregation Strategy

Tier Granularity Retention Use case
Live Second-level (Flink) 1 hour Real-time alerting, live counters
Minute 1-minute rollup (ClickHouse MV) 7 days Short-range dashboard sparklines
Hour 1-hour rollup (ClickHouse MV) 90 days Dashboard primary time series
Day Daily rollup (Spark batch) 3 years Historical trend analysis, reports

Cache Invalidation Strategy

Event-driven cache busting using Flink: when a new minute-window closes, publish a cache invalidation event to a Kafka topic; query servers consume it and evict the relevant cache keys for the affected tenant_id + time_bucket combinations. This avoids polling-based TTL staleness while keeping cache hit rates above 70% for repeat dashboard loads.

9. Reliability, Data Quality & SLOs

An analytics platform that loses events silently is worse than one that fails loudly — customers make business decisions based on your data. Data quality must be enforced at every layer, and SLOs must be defined before the system is built, not after.

Data Quality Checks

SLO Definitions

Production SLOs

  • Ingestion availability: 99.99% measured over rolling 30 days (<4.4 minutes downtime/month)
  • P99 dashboard load time: <2 seconds for pre-aggregated queries
  • Data freshness: <5 seconds end-to-end for real-time dashboard data
  • Query success rate: >99.9% of queries return results within timeout
  • Data completeness: <0.01% event loss rate measured via end-to-end reconciliation
  • Error budget policy: Freeze all non-critical deployments when monthly error budget is 50% consumed; incident review required at 75% consumed

Disaster Recovery

10. Real-World Case Studies

The architectural patterns described above are not theoretical — they power the analytics infrastructure at some of the most data-intensive companies in the world. Here's what the published engineering literature reveals about their choices and lessons learned.

Pinterest — Apache Druid at 1.7 Trillion Rows

Scale: 1.7 trillion rows, 50K+ queries/day for ad analytics
Architecture: Kafka → Druid real-time nodes → historical nodes
Result: Sub-second P50 latency for ad campaign performance queries; enabled real-time budget pacing for advertisers
Key lesson: Druid's time-based partitioning (segments) makes time-range queries extremely efficient for ad analytics workloads

Uber — Apache Pinot (uPinot) at 600K+ Queries/Day

Scale: 600K+ queries/day across real-time driver supply/demand analytics
Architecture: Kafka → Pinot real-time tables; batch import for historical
Result: Real-time visibility into driver availability per geo-hash enabled dynamic surge pricing optimization
Key lesson: Pinot's upsert support enables real-time fact updates (e.g., trip status changes) without full reingestion

Cloudflare — ClickHouse at 6M Events/Second

Scale: 6+ million HTTP log events per second across global edge network
Architecture: Direct ClickHouse ingestion via HTTP table engine; MergeTree with ZooKeeper replication
Result: Full SQL analytics on DNS and HTTP traffic; sub-second dashboards for security & DDoS detection
Key lesson: Full ANSI SQL support was the key ClickHouse differentiator — enabled ad-hoc security investigations without pre-defined query schemas

Netflix — Druid + Apache Iceberg for A/B Testing

Scale: Viewer engagement analytics across 200M+ subscribers; A/B test result dashboards
Architecture: Kafka → Flink → Iceberg data lake (truth) + Druid hot tier (dashboards)
Result: Recommendation model metrics available within 2 minutes of viewing events; A/B test results queryable in near real-time
Key lesson: Iceberg + Druid combination gives the best of both worlds — Iceberg for reliable batch reprocessing and schema evolution; Druid for sub-second dashboard serving

Lessons Learned Across All Case Studies

  • Pre-aggregate aggressively: Across all four companies, 80%+ of queries hit materialized views or pre-computed rollups — raw table scans are the exception, not the rule
  • Schema design is critical: The ORDER BY / sort key in ClickHouse (and segment granularity in Druid) affects query performance by 10× or more; changing it requires full data reingestion
  • Kafka retention = safety net: Keep a minimum of 7 days of Kafka retention (or use Tiered Storage for unlimited retention) to allow replaying the speed layer without accessing the data lake — this has saved multiple production incidents
  • Start with Kappa, graduate to Lambda: Most teams start with a single Flink pipeline (Kappa) and add a Spark batch layer (Lambda) only when reprocessing accuracy becomes a measurable business requirement
  • Monitor data freshness as an SLO: Treat end-to-end data latency (event time to dashboard visibility) as a first-class SLO with alerting; silent lag is the most dangerous failure mode in analytics systems

Leave a Comment

Related Posts

Md Sanwar Hossain - Software Engineer
Md Sanwar Hossain

Software Engineer · Java · Spring Boot · Microservices · AI/LLM Systems

All Posts
Last updated: April 11, 2026