This table contains per-file statistics for the shredded sub-fields of variant columns.
| Column name | Column type | |
|---|---|---|
data_file_id |
BIGINT |
|
table_id |
BIGINT |
|
column_id |
BIGINT |
|
variant_path |
VARCHAR |
|
shredded_type |
VARCHAR |
|
column_size_bytes |
BIGINT |
|
value_count |
BIGINT |
|
null_count |
BIGINT |
|
min_value |
VARCHAR |
|
max_value |
VARCHAR |
|
contains_nan |
BOOLEAN |
|
extra_stats |
VARCHAR |
data_file_idrefers to adata_file_idfrom theducklake_data_filetable.table_idrefers to atable_idfrom theducklake_tabletable.column_idrefers to acolumn_idfrom theducklake_columntable and identifies thevariantcolumn that contains the shredded field.variant_pathis the path to the shredded sub-field within the variant. Named fields are always quoted (e.g.,"l_orderkey"), and any quote characters within a field name are escaped by doubling them. Two special path values exist:root— the variant value itself is a primitive (i.e., not nested).element— the statistics apply to elements of an array-typed variant.- Paths can be composed, e.g.,
element."a"refers to theafield of each element of an array.
shredded_typeis the DuckLake type name of the shredded field (e.g.,int64,date,decimal(15,2),varchar).column_size_bytesis the byte size of the shredded field in this file.value_countis the number of non-null values in the shredded field.null_countis the number of rows where the shredded field isNULLor absent.min_valuecontains the minimum value for the shredded field, encoded as a string.max_valuecontains the maximum value for the shredded field, encoded as a string.contains_nanis a flag whether the shredded field contains anyNaNvalues. Only relevant for floating-point types.extra_statsis reserved for additional type-specific statistics.
A row is written to this table for every fully shredded sub-field of a variant column in a data file. A sub-field is considered fully shredded when, for every row in the file, the field is either present as a primitive of a single consistent type, absent, or NULL. Fields that mix types across rows are not recorded.
Global (table-wide) statistics for shredded variant fields that are consistently shredded across every file are stored in the extra_stats column of the ducklake_table_column_stats table, encoded as JSON.