Data lake operations
DuckLake supports snapshots, time travel queries, schema evolution and partitioning.
DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and your SQL database. It's an open, standalone format from the DuckDB team.
With DuckLake, all you need to run your own data warehouse is a catalog database and storage for Parquet files.
Users can run multiple DuckLake clients and connect concurrently to the catalog database – PostgreSQL, MySQL or SQLite – to work over the same DuckLake dataset.
DuckLake also works with DuckDB as the catalog database. In this case, you are limited to a single client but still get to enjoy the data lake features such as time travel.
DuckLake can use any SQL system as its catalog database, provided that it supports ACID transactions and primary key constraints.
DuckLake can store your data on any object storage such as AWS S3.
DuckLake supports snapshots, time travel queries, schema evolution and partitioning.
You can have as many snapshots as you want without frequent compacting steps!
DuckLake allows concurrent access with ACID transactional guarantees over multi-table operations.
DuckLake uses statistics for filter pushdown, enabling fast queries even on large datasets.
Listen to Hannes Mühleisen and Mark Raasveldt walk through the history of data lakes and introduce DuckLake, a new lakehouse format.
DuckDB provides first-class support for DuckLake through its highly portable extension, running wherever DuckDB does.
INSTALL ducklake;
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake;
USE my_ducklake;
INSTALL ducklake;
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake;
USE my_ducklake;
INSTALL ducklake;
INSTALL postgres;
-- Make sure that the database `ducklake_catalog` exists in PostgreSQL.
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=your_postgres_host' AS my_ducklake
(DATA_PATH 'data_files/');
USE my_ducklake;
INSTALL ducklake;
INSTALL sqlite;
ATTACH 'ducklake:sqlite:metadata.sqlite' AS my_ducklake
(DATA_PATH 'data_files/');
USE my_ducklake;
INSTALL ducklake;
INSTALL mysql;
-- Make sure that the database `ducklake_catalog` exists in MySQL
ATTACH 'ducklake:mysql:db=ducklake_catalog host=your_mysql_host' AS my_ducklake
(DATA_PATH 'data_files/');
USE my_ducklake;
Answers to common questions to help you understand and make the most of DuckLake.
DuckLake provides a lightweight one-stop solution if you need a data lake and catalog.
You can use DuckLake for a “multiplayer DuckDB” setup with multiple DuckDB instances reading and writing the same dataset – a concurrency model not supported by vanilla DuckDB.
If you only use DuckDB for both your DuckLake entry point and your catalog database, you can still benefit from DuckLake: you can run time travel queries, exploit data partitioning, and can store your data in multiple files instead of using a single (potentially very large) database file.
ducklake
DuckDB extension, which supports reading/writing datasets using the DuckLake format.ducklake
DuckDB extension are released under the MIT license.