Welcome to the DuckLake documentation. This documentation has two parts:
- The DuckLake specification: The specification of the DuckLake lakehouse format. It describes the SQL tables and queries used to define the catalog database.
- The
ducklakeDuckDB extension: User guide for theducklakeDuckDB extension. It presents the features of DuckLake through examples.
When Should I Use DuckLake?
DuckLake provides a lightweight one-stop solution if you need a lakehouse, i.e., a data lake with a catalog. DuckLake has all the features provided by lakehouse formats: you can run time travel queries, exploit data partitioning, perform schema evolution, and can store your data in multiple files instead of using a single (potentially very large) database file, that works well with object storage (e.g., Amazon S3).
If you use DuckLake from DuckDB, you can use it to achieve a “multiplayer DuckDB” setup with multiple processes reading and writing the same dataset – a concurrency model currently not supported by DuckDB's native database format.
List of DuckLake Clients
The ducklake DuckDB extension serves as the reference implementation for DuckLake clients.
Additionally, DuckLake currently has implementations for the following libraries (at various levels of maturity):
Single File Documentation
You can download the documentation as a single file: