ducklake_file_column_statistics

Documentation / Specification / Tables

This table contains column-level statistics for a single data file.

data_file_id refers to a data_file_id from the ducklake_data_file table.
table_id refers to a table_id from the ducklake_table table.
column_id refers to a column_id from the ducklake_column table.
column_size_bytes is the byte size of the column.
value_count is the number of values in the column. This does not have to correspond to the number of records in the file for nested types.
null_count is the number of values in the column that are NULL.
nan_count is the number of values in the column that are NaN. This is only relevant for floating-point types.
min_value contains the minimum value for the column, encoded as a string. This does not have to be exact but has to be a lower bound. The value has to be cast to the actual type for accurate comparision, e.g. on integer types.
max_value contains the maximum value for the column, encoded as a string. This does not have to be exact but has to be an upper bound. The value has to be cast to the actual type for accurate comparision, e.g. on integer types.
contains_nan is a flag whether the column contains any NaN values. This is only relevant for floating-point types.