DuckLake tables can be configured with a sort order. When a sort order is defined, data is physically sorted by the specified columns whenever it is written out as Parquet — during INSERT, file compaction and inlined data flushing.
Sorting data before writing improves the effectiveness of min/max statistics at query time, which allows the DuckDB query engine to skip data files whose value ranges do not overlap with a query's filter predicates.
Example Setup
The examples on this page use the following DuckLake instance:
ATTACH 'ducklake:sorted.duckdb' AS my_ducklake (DATA_PATH 'data/');
USE my_ducklake;
CREATE TABLE events (event_time TIMESTAMP, event_type VARCHAR, value DOUBLE);
Setting a Sort Order
Set the sort order for a table using SET SORTED BY:
ALTER TABLE events SET SORTED BY (event_time ASC);
Multiple sort keys are supported:
ALTER TABLE events SET SORTED BY (event_time ASC, event_type DESC);
ASC and DESC control the sort direction. NULLS FIRST and NULLS LAST are also supported to control null ordering:
ALTER TABLE events SET SORTED BY (event_time ASC NULLS LAST);
Expression-Based Sort Keys
Arbitrary expressions are supported in SET SORTED BY, not just column references. This includes function calls, casts, and DuckLake macros.
Sort by the hour extracted from a timestamp:
ALTER TABLE events SET SORTED BY (date_trunc('hour', event_time) ASC);
Sort by a DuckLake macro:
CREATE MACRO event_bucket(t) AS date_trunc('day', t);
ALTER TABLE events SET SORTED BY (event_bucket(event_time) ASC);
Expressions are validated when SET SORTED BY is executed — an error is returned if any referenced columns or functions cannot be resolved.
Removing a Sort Order
To remove the sort order from a table, use RESET SORTED BY:
ALTER TABLE events RESET SORTED BY;
After resetting, subsequent compactions and flushes will write data without sorting.
Effect on Insert
By default, INSERT statements automatically sort data according to the table's sort order before writing Parquet files. This behavior is controlled by the sort_on_insert option, which defaults to true.
To disable sorting on insert (e.g., when insertion speed is the primary concern):
CALL my_ducklake.set_option('sort_on_insert', false, table_name => 'events');
When sort_on_insert is disabled, data written to Parquet during compaction and inlined data flushing is still sorted according to the table's sort order.
Interaction with Data Inlining
When data inlining is enabled and sort_on_insert is false, data that exceeds the inlining row limit is still sorted before being written to Parquet. Inlined data (which stays in the metadata catalog) is not sorted at insert time — it will be sorted when it is flushed.
Effect on Compaction and Flush
Once a sort order is set, the current sort order is applied at the time of compaction or flush — not the sort order that was active when the source data was written.
When ducklake_merge_adjacent_files runs on a sorted table, the merged output file is sorted:
ALTER TABLE events SET SORTED BY (event_time ASC);
CALL ducklake_merge_adjacent_files('my_ducklake', 'events');
The same applies when flushing inlined data:
CALL ducklake_flush_inlined_data('my_ducklake', table_name => 'events');