DuckLake tables can be configured with a sort order. When a sort order is defined, data is physically sorted by the specified columns whenever it is written out as Parquet — during file compaction and inlined data flushing.
Sorting data before writing improves the effectiveness of min/max statistics at query time, which allows the DuckDB query engine to skip data files whose value ranges do not overlap with a query's filter predicates.
Note Sorting applies during compaction (
ducklake_merge_adjacent_files) and inlined data flushing (ducklake_flush_inlined_data). Data inserted directly is not sorted at write time. You can include anORDER BYclause in yourINSERTstatement as a workaround until insert-time sorting is supported.
Example Setup
The examples on this page use the following DuckLake instance:
ATTACH 'ducklake:sorted.duckdb' AS my_ducklake (DATA_PATH 'data/');
USE my_ducklake;
CREATE TABLE events (event_time TIMESTAMP, event_type VARCHAR, value DOUBLE);
Setting a Sort Order
Set the sort order for a table using SET SORTED BY:
ALTER TABLE events SET SORTED BY (event_time ASC);
Multiple sort keys are supported:
ALTER TABLE events SET SORTED BY (event_time ASC, event_type DESC);
ASC and DESC control the sort direction. NULLS FIRST and NULLS LAST are also supported to control null ordering:
ALTER TABLE events SET SORTED BY (event_time ASC NULLS LAST);
Expression-Based Sort Keys
Arbitrary expressions are supported in SET SORTED BY, not just column references. This includes function calls, casts, and DuckLake macros.
Sort by the hour extracted from a timestamp:
ALTER TABLE events SET SORTED BY (date_trunc('hour', event_time) ASC);
Sort by a DuckLake macro:
CREATE MACRO event_bucket(t) AS date_trunc('day', t);
ALTER TABLE events SET SORTED BY (event_bucket(event_time) ASC);
Expressions are validated when SET SORTED BY is executed — an error is returned if any referenced columns or functions cannot be resolved.
Removing a Sort Order
To remove the sort order from a table, use RESET SORTED BY:
ALTER TABLE events RESET SORTED BY;
After resetting, subsequent compactions and flushes will write data without sorting.
Effect on Compaction and Flush
Once a sort order is set, the current sort order is applied at the time of compaction or flush — not the sort order that was active when the source data was written.
When ducklake_merge_adjacent_files runs on a sorted table, the merged output file is sorted:
ALTER TABLE events SET SORTED BY (event_time ASC);
CALL ducklake_merge_adjacent_files('my_ducklake', 'events');
The same applies when flushing inlined data:
CALL ducklake_flush_inlined_data('my_ducklake', table_name => 'events');
Limitations
- Sorting happens during compaction and inlined data flush, not during
INSERT. UseORDER BYin yourINSERTstatement if you need sorted data at insert time.