StarRocks & Apache Doris
StarRocks and Apache Doris are open-source MPP (massively parallel processing) analytical databases descended from the same lineage as Apache Doris (originally Baidu’s Palo). Doris graduated to an Apache top-level project; StarRocks (StarRocks, Inc., backed by CelerData) forked and went its own direction in 2020. Both are positioned as low-latency, high-concurrency alternatives to Trino for interactive BI on a lakehouse, with native readers for Iceberg, Hudi, Delta, and Paimon.
Key Features:
- Vectorized MPP Execution. Columnar in-memory operators, SIMD pipelines, runtime filters — sub-second response on multi-billion-row queries.
- Native Lakehouse Readers. First-class connectors for Iceberg, Hudi, Delta, Paimon, and Hive Metastore. No extra ETL into a proprietary format required.
- Materialized Views. Both engines auto-rewrite queries against base tables to use materialized views, including incremental refresh.
- High Concurrency. Tuned for hundreds-to-thousands of concurrent BI users where Trino is typically tuned for tens.
- Internal Storage Tier. Optional native storage for hot data; cold data stays on the lake. Hybrid hot/cold queries in one statement.
- Standard SQL. ANSI SQL with PostgreSQL-flavored extensions; JDBC / MySQL wire protocol.
StarRocks vs. Apache Doris:
- StarRocks — More aggressive on materialized-view automation, lakehouse acceleration features, and cloud-native storage-compute separation. Backed by a single vendor (CelerData).
- Apache Doris — Apache governance, broader contributor base, more conservative roadmap. Strong in China and growing in EU.
- Wire-protocol-compatible enough that BI tools usually work with either.
StarRocks / Doris vs. Trino:
- StarRocks / Doris — Faster on dashboard-style queries, support hot caching, can run at higher concurrency. Internal storage adds operational complexity if used.
- Trino — Pure query engine, no storage to manage. Wider connector catalog (30+ sources). Simpler operationally; lower per-query latency on cold data.
Use Cases:
- Customer-facing analytics where p99 must be sub-second under load.
- Real-time dashboards combining Iceberg / Paimon CDC with sub-second queries.
- Replacing legacy Vertica / Greenplum / SQL Server cubes with an open MPP engine.
- The serving tier in a Flink + Paimon + StarRocks streaming lakehouse stack.