Frequently Asked Questions
Overview
Why should I use DuckLake?
DuckLake provides a lightweight one-stop solution if you need a data lake and catalog.
You can use DuckLake for a “multiplayer DuckDB” setup with multiple DuckDB instances reading and writing the same dataset – a concurrency model not supported by vanilla DuckDB.
If you only use DuckDB for both your DuckLake entry point and your catalog database, you can still benefit from DuckLake: you can run time travel queries, exploit data partitioning, and can store your data in multiple files instead of using a single (potentially very large) database file.
Is DuckLake an open table format?
DuckLake includes an open table format but it's also a data lakehouse format, meaning that it also contains a catalog to encode the schema of the data stored. When comparing to other technologies, DuckLake is similar to Delta Lake with Unity Catalog and Iceberg with Lakekeeper or Polaris.
What is “DuckLake”?
First of all, a catchy name for a DuckDB-originated technology for data lakes and lakehouses. More seriously, the term “DuckLake” can refer to three things:
- the specification of the DuckLake lakehouse format,
- the
ducklake
DuckDB extension, which supports reading/writing datasets in the DuckLake specification, - a DuckLake, a dataset stored using the DuckLake lakehouse format.
Architecture
What are the main components of DuckLake?
DuckLake needs a storage layer (both blob storage and block-based storage work) and a catalog database (any SQL-compatible database works).
Does DuckLake work on AWS S3 (or a compatible storage)?
DuckLake can store the data files (Parquet files) on the AWS S3 blob storage or compatible solutions such as Azure Blob Storage, Google Cloud Storage or Cloudflare R2. You can run the catalog database anywhere, e.g., in an AWS Aurora database.
DuckLake in Operation
Is DuckLake production-ready?
While we tested DuckLake extensively, it is not yet production-ready as demonstrated by its version number . We expect DuckLake to mature over the course of 2025.
How is authentication implemented in DuckLake?
DuckLake piggybacks on the authentication of the metadata catalog database. For example, if your catalog database is Postgres, you can use Postgres' authentication and authorization methods to protect your DuckLake. This is particularly effective when enabling encryption of DuckLake files.
How does DuckLake deal with the “small files problem”?
The “small files problem” is a well-known problem in data lake formats and occurs e.g. when data is inserted in small batches, yielding many small files with each storing only a small amount of data. DuckLake significantly mitigates this problem by storing the metadata in a database system (catalog database) and making the compaction step simple. DuckLake also harnesses the catalog database to stage data (a technique called “data inlining”) before serializing it into Parquet files. Further improvements are on the roadmap.
Features
Are constraints such as primary keys and foreign keys supported?
No. Similarly to other data lakehouse technologies, DuckLake does not support constraints, keys, or indexes.
Can I export my DuckLake into other lakehouse formats?
This is currently not supported, but planned for the future. Currently, you can export DuckLake into a DuckDB database and export it into e.g. vanilla Parquet files.
Are DuckDB database files supported as the data files for DuckLake?
The data files of DuckLake must be stored in Parquet. Using DuckDB files as storage are not supported at the moment.
Are there any practical limits to the size of data and the number of snapshots?
No. The only limitation is the catalog database's performance but even with a relatively slow catalog database, you can have terabytes of data and millions of snapshots.
Development
How is DuckLake tested?
DuckLake receives extensive testing, including running the applicable subset of DuckDB's thorough test suite. That said, if you encounter any problems using DuckLake, please submit an issue in the DuckLake issue tracker.
How can I contribute to DuckLake?
If you encounter any problems using DuckLake, please submit an issue in the DuckLake issue tracker. If you have any suggestions or feature requests, please open a ticket in DuckLake's discussion forum. You are also welcome to implement support in other systems for DuckLake following the specification.
What is the license of DuckLake?
The DuckLake specification and the DuckLake DuckDB extension are released under the MIT license.