Apache Cassandra
Apache Cassandra is the canonical wide-column / Dynamo-style distributed database. Originally developed at Facebook for Inbox Search, open-sourced in 2008, and now the storage layer behind Netflix, Apple, Discord, Instagram, and Uber. Cassandra is built for write-heavy, multi-region workloads where availability and linear scalability trump strong consistency.
Key Features:
- Masterless Peer-to-Peer. Every node is identical; no leader, no single point of failure. Token-based partitioning (Murmur3 hash) places data uniformly across the ring.
- Tunable Consistency. Per-query:
ONE, QUORUM, LOCAL_QUORUM, ALL. Trade latency for consistency at the operation level.
- CQL. SQL-like query language —
SELECT, INSERT, UPDATE — over a wide-column data model. No JOINs, no aggregations across partitions.
- LSM-Tree Storage. Append-only commit log + memtable + immutable SSTables, with background compaction. Writes are extremely cheap.
- Multi-Region Replication. Native multi-DC topology with per-DC replication factor;
LOCAL_QUORUM reads stay in-region.
- Linear Scalability. Add nodes; throughput grows nearly linearly. No re-sharding ceremony.
- Materialized Views & Secondary Indexes. Available but discouraged on hot tables; the recommended pattern is denormalize at write time.
Data Model:
A Cassandra table is keyed by a partition key (which determines which node owns the data) plus optional clustering columns (which determine sort order within the partition). The right partition-key choice is the single most important design decision — bad keys produce hot partitions or cross-partition scans, both of which are pathological in Cassandra.
Use Cases:
- Time-series telemetry — metrics, logs, IoT events — partitioned by device + day.
- Massive-fan-out social timelines — per-user feed tables, written at follow-time.
- Multi-region active/active deployments where regional outages must not affect availability.
- Write-heavy event ingestion at hundreds of thousands of writes/sec.
Cassandra vs. Other NoSQL:
- vs. MongoDB — Cassandra wins on write throughput, multi-region availability, and linear scale. MongoDB wins on flexible querying, secondary indexes, and developer ergonomics.
- vs. ScyllaDB — ScyllaDB is a C++ / Seastar rewrite of Cassandra; same protocol, same data model, much higher per-node throughput, lower p99.
- vs. HBase — Cassandra is masterless and multi-region-native; HBase is master-coordinated and Hadoop-coupled.