DuckLake is an integrated data lake and catalog format
We released DuckLake v1.0, a production-ready version of the DuckLake specification.
Deployed in production
- PostgreSQL
- SQLite
- DuckDB
Client
- Multiple clients
- Works locally or in the cloud
- No vendor lock-in
DuckLake
Catalog:
The catalog is served by an ACID-compliant SQL database
Storage: Parquet files
Parquet files can be stored on local disk or in object storage
DuckLake’s key features
Data lake operations
DuckLake supports snapshots, time travel queries, schema evolution and partitioning.
Lightweight snapshots
You can have as many snapshots as you want without frequent compacting steps!
ACID transactions
DuckLake allows concurrent access with ACID transactional guarantees over multi-table operations.
Performance-oriented
DuckLake uses statistics for filter pushdown, enabling fast queries even on large datasets.
In conversation: DuckLake v1.0
Listen to Mark Raasveldt and Pedro Holanda discuss the road that led
to the DuckLake specification and explain the features of DuckLake v1.0.
Create your first DuckLake
DuckDB provides first-class support for DuckLake and can use PostgreSQL, SQLite or DuckDB as the catalog database.
INSTALL ducklake;
ATTACH 'ducklake:metadata.ducklake'
AS my_ducklake
(DATA_PATH 'data/');
USE my_ducklake;INSTALL ducklake;
ATTACH 'ducklake:metadata.ducklake'
AS my_ducklake
(DATA_PATH 'data/');
USE my_ducklake;INSTALL ducklake;
INSTALL postgres;
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=your_postgres_host'
AS my_ducklake (DATA_PATH 'data/');
USE my_ducklake;INSTALL ducklake;
INSTALL sqlite;
ATTACH 'ducklake:sqlite:metadata.sqlite'
AS my_ducklake (DATA_PATH 'data/');
USE my_ducklake;Frequently asked questions
Do you have any questions about DuckLake? We got you covered.
Why should I use DuckLake?
DuckLake provides a lightweight one-stop solution if you need a data lake and catalog.
You can use DuckLake for a “multiplayer DuckDB” setup with multiple DuckDB instances reading and writing the same dataset – a concurrency model not supported by vanilla DuckDB.
If you only use DuckDB for both your DuckLake entry point and your catalog database, you can still benefit from DuckLake: you can run time travel queries, exploit data partitioning, and can store your data in multiple files instead of using a single (potentially very large) database file.
What is “DuckLake”?
- The DuckLake format that uses a catalog database and a Parquet storage to store data.
- A DuckLake instance storing a dataset with the DuckLake lakehouse format.
- The
ducklakeDuckDB extension, which supports reading/writing datasets using the DuckLake format.
Is DuckLake production-ready?
What is the license of DuckLake?
ducklake DuckDB extension are released under the MIT license.