Apache Cassandra

Apache Cassandra is the canonical wide-column / Dynamo-style distributed database. Originally developed at Facebook for Inbox Search, open-sourced in 2008, and now the storage layer behind Netflix, Apple, Discord, Instagram, and Uber. Cassandra is built for write-heavy, multi-region workloads where availability and linear scalability trump strong consistency.

Key Features:

Masterless Peer-to-Peer. Every node is identical; no leader, no single point of failure. Token-based partitioning (Murmur3 hash) places data uniformly across the ring.
Tunable Consistency. Per-query: ONE, QUORUM, LOCAL_QUORUM, ALL. Trade latency for consistency at the operation level.
CQL. SQL-like query language — SELECT, INSERT, UPDATE — over a wide-column data model. No JOINs, no aggregations across partitions.
LSM-Tree Storage. Append-only commit log + memtable + immutable SSTables, with background compaction. Writes are extremely cheap.
Multi-Region Replication. Native multi-DC topology with per-DC replication factor; LOCAL_QUORUM reads stay in-region.
Linear Scalability. Add nodes; throughput grows nearly linearly. No re-sharding ceremony.
Materialized Views & Secondary Indexes. Available but discouraged on hot tables; the recommended pattern is denormalize at write time.

Data Model:

A Cassandra table is keyed by a partition key (which determines which node owns the data) plus optional clustering columns (which determine sort order within the partition). The right partition-key choice is the single most important design decision — bad keys produce hot partitions or cross-partition scans, both of which are pathological in Cassandra.

Use Cases:

Time-series telemetry — metrics, logs, IoT events — partitioned by device + day.
Massive-fan-out social timelines — per-user feed tables, written at follow-time.
Multi-region active/active deployments where regional outages must not affect availability.
Write-heavy event ingestion at hundreds of thousands of writes/sec.

Cassandra vs. Other NoSQL:

vs. MongoDB — Cassandra wins on write throughput, multi-region availability, and linear scale. MongoDB wins on flexible querying, secondary indexes, and developer ergonomics.
vs. ScyllaDB — ScyllaDB is a C++ / Seastar rewrite of Cassandra; same protocol, same data model, much higher per-node throughput, lower p99.
vs. HBase — Cassandra is masterless and multi-region-native; HBase is master-coordinated and Hadoop-coupled.