Apache Paimon

Apache Paimon (formerly Flink Table Store) is a streaming-first open table format that originated in the Apache Flink community and graduated to a top-level Apache project in 2024. Where Hudi, Iceberg, and Delta evolved from batch-first roots, Paimon was designed from day one for high-frequency CDC and real-time ingest, using an LSM-tree storage layer rather than the snapshot-of-Parquet-files model.

Key Features:

LSM-Tree Storage. Sorted runs and background compaction give cheap, frequent updates — the right shape for CDC streams that change a small fraction of rows per second.
Streaming Source & Sink. First-class Flink integration; tables work as streaming sources with sub-minute latency and as sinks for exactly-once writes.
Changelog Production. Reading a table as a changelog (insert / update / delete events) is a primitive operation, not a derived one.
Primary Key Tables. Natural UPSERT and DELETE semantics by primary key, like a database.
Multi-Engine Support. Flink, Spark, Trino, StarRocks, Doris, and Hive can all read Paimon tables.
Hive Metastore Compatible. Works with existing Hive / AWS Glue catalogs, easing adoption.

Paimon vs. Hudi vs. Iceberg vs. Delta:

Paimon — LSM-tree, streaming-native, primary-key UPSERT first.
Hudi — Originally streaming-friendly via merge-on-read; broader ecosystem.
Iceberg — Snapshot-of-Parquet, batch-first, becoming streaming-capable with V2 deletes.
Delta — Snapshot-of-Parquet plus transaction log; strong batch + Spark Streaming.

Use Cases:

CDC ingest from operational databases at high update rates (thousands of upserts/sec per table).
Real-time materialized views in a Flink-centric lakehouse.
Streaming joins between fact and dimension tables with sub-minute freshness.
Workloads where Hudi’s merge-on-read isn’t fast enough on tiny commits.