⌘+k ctrl+k
0.3 (stable)
Search Shortcut cmd + k | ctrl + k
Cleanup of Files

In DuckLake you may want to delete data files that are either a part of an expired snapshot or are simply not tracked by DuckLake anymore (i.e., orphaned files).

Cleanup of Files from Expired Snapshots

When files are no longer required in DuckLake, due to e.g. snapshots being expired or files being merged, they are not immediately deleted. The reason for this is that there might still be active queries that are scanning these files.

The files are instead added to the ducklake_files_scheduled_for_deletion table. These files can then be deleted at a later point. It is generally safe to delete files that have been scheduled for deletion more than a few days ago, provided there are no read transactions that last that long. The files can be deleted using the ducklake_cleanup_old_files function.

Usage

The below command deletes all files scheduled for deletion.

CALL ducklake_cleanup_old_files('ducklake', cleanup_all => true);

The below command deletes all files that were scheduled for deletion more than a week ago.

CALL ducklake_cleanup_old_files('ducklake', older_than => now() - INTERVAL '1 week');

The below command performs a dry run, which only lists the files that will be deleted, instead of actually deleting them.

CALL ducklake_cleanup_old_files('ducklake', dry_run => true, older_than => now() - INTERVAL '1 week');

Cleanup of Orphaned Files

Orphan files are files that are untracked by the DuckLake metadata catalog. They may appear due to, for example, a systems failure. DuckLake provides the ducklake_delete_orphaned_files function to delete these files.

Usage

The below command deletes all orphaned files.

CALL ducklake_delete_orphaned_files('ducklake', cleanup_all => true);

The below command deletes all files that are older than a specified time.

CALL ducklake_delete_orphaned_files('ducklake', older_than => now() - INTERVAL '1 week');

The below command performs a dry run, which only lists the files that will be deleted, instead of actually deleting them.

CALL ducklake_delete_orphaned_files('ducklake', dry_run => true, older_than => now() - INTERVAL '1 week');

There is also a catalog-level option available.

CALL ducklake.set_option('delete_older_than', '1 week');