⌘+k ctrl+k
0.3 (stable)
Search Shortcut cmd + k | ctrl + k
Public DuckLake on Object Storage

This guide explains how to set up a public read-only DuckLake on object storage such as Amazon S3, Backblaze B2 and Cloudflare R2. Users can query this DuckLake through HTTPS without authentication.

Warning Please check the pricing models of the providers to understand the costs of serving DuckLakes.

The setup described here is conceptually similar to Frozen DuckLakes but it is simpler to set up.

Steps

Creating the Bucket

Create a new public bucket:

Make sure that the bucket is accessible through the internet. The exact settings for this vary from platfrom to platform.

Creating the DuckLake

Create a new DuckLake following the “Using a Remove Data Path” guide using DuckDB as the catalog database and set the data path to the https:// URL that serves your bucket.

Uploading the DuckLake

Upload the .ducklake file and its data directory to the bucket. We recommend using Rclone, which supports all object storage platforms listed above.

rclone config

Create a new Rclone remote (see the details in the foldouts below). In our examples, we will call this your_rclone_remote.

Storage configuration in Rclone
Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 2 / Akamai NetStorage
   \ (netstorage)
 3 / Alias for an existing remote
   \ (alias)
 4 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, ArvanCloud, BizflyCloud, Ceph, ChinaMobile, Cloudflare, Cubbit, DigitalOcean, Dreamhost, Exaba, FileLu, FlashBlade, GCS, Hetzner, HuaweiOBS, IBMCOS, IDrive, Intercolo, IONOS, Leviia, Liara, Linode, LyveCloud, Magalu, Mega, Minio, Netease, Outscale, OVHcloud, Petabox, Qiniu, Rabata, RackCorp, Rclone, Scaleway, SeaweedFS, Selectel, Servercore, SpectraLogic, StackPath, Storj, Synology, TencentCOS, Wasabi, Zata, Other
   \ (s3)
 5 / Backblaze B2
   \ (b2)
...

To upload your data to the R2 bucket, run:

rclone cp -v your_ducklake_catalog.ducklake your_rclone_remote:your_bucket_name/path/
rclone sync -v your_ducklake_directory your_rclone_remote:your_bucket_name/path/

Cloudflare: Setting the CORS Policy for Browser Access

If you are using Cloudflare as your storage and try to query the dataset from another website – such as the online DuckDB shell –, you will get an error due to the CORS (Cross-Origin Resource Sharing) security mechanism:

IO Error: Failed to attach DuckLake MetaData "__ducklake_metadata_..." at path + "..."
Cannot open database "..." in read-only mode: database does not exist

or

Invalid Error: Failed to attach DuckLake MetaData "__ducklake_metadata..." at path + "..."
Opening file '...' failed with error:
NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load '...'.

To allow querying the dataset, add a CORS policy to the Cloudflare configuration of the bucket. How to do this depends on whether you are serving directly from the bucket or through a Cloudflare Worker.

If you are serving through a Cloudflare Worker, edit the code of the Worker following the “CORS header proxy” and add the following to the JavaScript code of your fetch function:

const allowedOrigins = [
  "https://duckdb.org",
  "https://shell.duckdb.org",
];

const origin = request.headers.get("Origin");

let corsOrigin = "";

if (allowedOrigins.includes(origin)) {
  corsOrigin = origin;
}
return new Response(object.body, {
  headers: {
    "Content-Type": contentType,
    "Access-Control-Allow-Origin": corsOrigin,
    "Cache-Control": "public, max-age=3600",
  }
});

If you are serving directly from the bucket, navigate to its settings and add the following CORS Policy:

[
  {
    "AllowedOrigins": [
      "https://duckdb.org",
      "https://shell.duckdb.org"
    ],
    "AllowedMethods": [
      "GET"
    ]
  }
]

For the complete Cloudflare guide on CORS policies, see the “Configure CORS” page.