Search Shortcut cmd + k | ctrl + k

DuckLake is an integrated data lake and catalog format

We released DuckLake v1.0, a production-ready version of the DuckLake specification.

Deployed in production

Summer Forever Altertable Windmill Decision Computing locals.com Summation Media Cluster Norway Ascend.io PostHog Sliplane Austrian Supply Chain Intelligence Institute
  • PostgreSQL
  • SQLite
  • DuckDB

Client

Client Window
Client Window
  • Multiple clients
  • Works locally or in the cloud
  • No vendor lock-in

DuckLake

Catalog:

The catalog is served by an ACID-compliant SQL database

Database Icon

Storage: Parquet files

Parquet files can be stored on local disk or in object storage

Parquet Folder

DuckLake’s key features

Data lake operations

DuckLake supports snapshots, time travel queries, schema evolution and partitioning.

Lightweight snapshots

You can have as many snapshots as you want without frequent compacting steps!

ACID transactions

DuckLake allows concurrent access with ACID transactional guarantees over multi-table operations.

Performance-oriented

DuckLake uses statistics for filter pushdown, enabling fast queries even on large datasets.

Thumbnail: In conversation: DuckDB founders on DuckLake

In conversation: DuckLake v1.0

Listen to Mark Raasveldt and Pedro Holanda discuss the road that led
to the DuckLake specification and explain the features of DuckLake v1.0.

Create your first DuckLake

DuckDB provides first-class support for DuckLake and can use PostgreSQL, SQLite or DuckDB as the catalog database.

  • DuckDB
  • SQLite
  • PostgreSQL
INSTALL ducklake;

ATTACH 'ducklake:metadata.ducklake'
    AS my_ducklake
    (DATA_PATH 'data/');

USE my_ducklake;

Frequently asked questions

Do you have any questions about DuckLake? We got you covered.

Why should I use DuckLake?

DuckLake provides a lightweight one-stop solution if you need a data lake and catalog.

You can use DuckLake for a “multiplayer DuckDB” setup with multiple DuckDB instances reading and writing the same dataset – a concurrency model not supported by vanilla DuckDB.

If you only use DuckDB for both your DuckLake entry point and your catalog database, you can still benefit from DuckLake: you can run time travel queries, exploit data partitioning, and can store your data in multiple files instead of using a single (potentially very large) database file.

What is “DuckLake”?

“DuckLake” can refer to a number of things:
  1. The DuckLake format that uses a catalog database and a Parquet storage to store data.
  2. A DuckLake instance storing a dataset with the DuckLake lakehouse format.
  3. The ducklake DuckDB extension, which supports reading/writing datasets using the DuckLake format.

Is DuckLake production-ready?

Yes! We published DuckLake v1.0 in April 2026, which is a production-ready release with guaranteed backward-compatibility.

What is the license of DuckLake?

The DuckLake specification and the ducklake DuckDB extension are released under the MIT license.