Unity Catalog
Unity Catalog is Databricks’ unified governance layer for the lakehouse. Open-sourced in mid-2024 (Unity Catalog OSS, Apache 2.0), it has become the default multi-engine catalog on Databricks and Microsoft Fabric, with growing support for non-Databricks engines through the open APIs. Unity is broader than a table catalog — it governs tables, volumes (files), ML models, AI tools, and SQL functions in one namespace hierarchy.
Key Features:
- Three-Level Namespace.
catalog.schema.object — a clean break from the flat Hive Metastore. Catalogs map naturally to environments, business units, or regions.
- Multi-Asset Governance. Tables (Delta + Iceberg), file volumes, registered ML models, AI agents, and functions all live under the same RBAC and lineage system.
- Iceberg + Delta Coexistence. Unity Catalog OSS supports both formats; tables can be read as either via UniForm.
- Fine-Grained Access Control. Row filters, column masks, dynamic views, attribute-based policies.
- Lineage & Audit. Automatic table-level and column-level lineage across notebooks, jobs, and dashboards.
- Open APIs. Iceberg REST and Delta Sharing protocols mean Trino, Spark, Snowflake, and other engines can authenticate and read governed data without going through Databricks runtimes.
Architecture:
Unity Catalog runs as a metadata service backed by a relational store. Compute engines (Databricks clusters, Trino, Spark) authenticate to UC, request access, and receive scoped cloud credentials and metadata pointers. Object storage paths are abstracted by external locations and storage credentials, so end users see logical names, not S3 ARNs.
Unity Catalog vs. Apache Polaris:
- Unity Catalog is broader: tables + files + models + functions, with deep Databricks integration.
- Polaris is narrower: Iceberg-only, REST-API-first, vendor-neutral.
- They overlap on Iceberg governance; in practice many large orgs run both, federating through Iceberg REST.
Use Cases:
- Multi-workspace governance across an enterprise Databricks deployment.
- Unified RBAC over tables, ML models, and unstructured files.
- Cross-engine sharing of Delta and Iceberg tables to Trino, Snowflake, and partners.
- Compliance regimes that require column-level audit and lineage.