Databases

A consolidated reference covering the full data stack — relational fundamentals, SQL practice, dimensional modeling, NoSQL and graph stores, ingestion and streaming pipelines, open lakehouse table formats, and modern vector databases for RAG and semantic search.

SQL

Language reference (SELECT, JOIN, window functions), query performance and EXPLAIN plans, plus interview-style worked exercises.

Relational & Modeling

RDBMS fundamentals, star/snowflake/galaxy schemas, and Kimball dimensional modeling — bus matrix, conformed dimensions, and a worked sales model.

NoSQL & Graph

The full non-relational landscape — in-memory (Redis), documents (MongoDB), wide-column (Cassandra), key-value (etcd, RocksDB), graph (Neo4j, Neptune), time-series, and search.

Pipelines

ETL and ELT patterns, large-scale ingestion, Apache NiFi flows, Kafka streaming, and Parquet columnar storage — the data movement layer.

Lakehouse

Open table formats (Hudi, Iceberg, Delta), catalogs (Polaris, Unity, Nessie), and query engines (Trino, StarRocks) — the open lakehouse stack.

Vector Databases

pgvector, Chroma, Weaviate, FAISS — embeddings, ANN indexes (HNSW, IVF, PQ), and the retrieval layer behind RAG and semantic search.

Quick Reference — Kimball & RDBMS Schemas

The rest of this page is a one-screen quick reference. For depth, follow the cards above into the section landing pages and their per-topic deep dives.

Kimball Bottom-Up Data Warehouse Architecture

Data marts first. Build small, focused marts for individual business functions — sales, marketing, finance — to address specific analytical needs quickly.
Dimensional modeling. Each mart uses a star schema, with fact tables for measurements and dimension tables for descriptive context, optimized for query efficiency.
Conformed dimensions. Shared dimensions such as time, geography, and product enable consistent reporting across departments.
Enterprise Data Warehouse (EDW). As marts integrate through conformed dimensions, they compose into a central EDW that supports cross-functional analytics.
BI and analytics. The EDW feeds dashboards, reports, and ad-hoc analysis used to inform business decisions.

Common RDBMS Schemas

Star Schema. Central fact table connected directly to denormalized dimension tables. Fewer joins, faster queries.
Snowflake Schema. Star schema with normalized dimensions. Reduces redundancy at the cost of more joins.
Galaxy Schema (Fact Constellation). Multiple fact tables share conformed dimension tables — for warehouses spanning multiple business processes.
Hierarchical Schema. Tree-structured data with parent–child relationships. Useful for organizational charts and nested catalogs.
Network Schema. Like hierarchical, but supports many-to-many relationships. For non-hierarchical, interconnected entities.
Flat Schema. A single table without hierarchy. Suitable for small, self-contained datasets.