⌘+k ctrl+k
0.3 (stable)
Search Shortcut cmd + k | ctrl + k
Using a Remote Data Path

This guide shows how to set up and load a DuckLake locally, where the DuckLake will be served on an https:// endpoint.

Tip DuckLake currently does not allow you to change the persisted data path in the catalog. This is a known limitation that will be lifted in the future, rendering the future version of this guide almost trivial.

In this guide, we assume that we want to create a DuckLake at https://blobs.duckdb.org/datalake/tpch-sf1/ to serve a read-only copy of the TPC-H SF1 dataset.

Initializing the DuckLake

We initialize the DataLake using the remote URL as the data path and immediately detach from the DuckLake:

ATTACH 'tpch-sf1.ducklake' AS tpch_sf1_ducklake (
    TYPE ducklake,
    DATA_PATH 'https://blobs.duckdb.org/datalake/tpch-sf1'
);
DETACH tpch_sf1_ducklake;

Populating the DuckLake

Next, we attach to the DuckLake with a local data path using the OVERRIDE_DATA_PATH true flag:

ATTACH 'tpch-sf1.ducklake' AS tpch_sf1_ducklake (
    TYPE ducklake,
    DATA_PATH 'tpch-sf1',
    OVERRIDE_DATA_PATH true
);
USE tpch_sf1_ducklake;

We generate the data using the tpchgen-cli tool:

tpchgen-cli --scale-factor 1 --format parquet

We then load the data into the DuckLake:

CREATE TABLE customer AS FROM 'customer.parquet';
CREATE TABLE lineitem AS FROM 'lineitem.parquet';
CREATE TABLE nation AS FROM 'nation.parquet';
CREATE TABLE orders AS FROM 'orders.parquet';
CREATE TABLE part AS FROM 'part.parquet';
CREATE TABLE partsupp AS FROM 'partsupp.parquet';
CREATE TABLE region AS FROM 'region.parquet';
CREATE TABLE supplier AS FROM 'supplier.parquet';

Finally, we detach from the DuckLake and close DuckDB:

DETACH tpch_sf1_ducklake;
.quit

Uploading the DuckLake

Now, we have the tpch-sf1/ directory with the Parquet files and the tpch-sf1.ducklake file:

ls tpch-sf1.ducklake
tpch-sf1.ducklake
tree tpch-sf1
tpch-sf1
└── main
    ├── customer
    │   └── ducklake-019a2726-0362-7eae-8a4d-1404ead2c506.parquet
    ├── lineitem
    │   └── ducklake-019a2726-03ad-7d79-b28a-6ae3f114fbd3.parquet
    ├── nation
    │   └── ducklake-019a2726-055d-7f55-914d-02787bda2eae.parquet
    ├── orders
    │   └── ducklake-019a2726-0579-72e2-be9e-0190b3f8f8af.parquet
    ├── part
    │   └── ducklake-019a2726-05fd-7d30-8051-a9172d75f815.parquet
    ├── partsupp
    │   └── ducklake-019a2726-0645-765b-9456-6ba632d8288b.parquet
    ├── region
    │   └── ducklake-019a2726-06a4-7b3a-9788-8f32006be0ad.parquet
    └── supplier
        └── ducklake-019a2726-06bb-72f4-97b3-131d7186787e.parquet

We upload both of these to https://blobs.duckdb.org/datalake/.

Using the DuckLake

You can now attach to the remote DuckLake as follows:

ATTACH 'https://blobs.duckdb.org/datalake/tpch-sf1.ducklake' AS tpch_sf1_ducklake (TYPE ducklake);
USE tpch_sf1_ducklake;

Now you can use it like any other DuckDB database or DuckLake:

SELECT count(*) FROM lineitem;
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    6001215     │
│ (6.00 million) │
└────────────────┘