Data files contain the actual row data.
| Column name | Column type | |
|---|---|---|
data_file_id |
BIGINT |
Primary key |
table_id |
BIGINT |
|
begin_snapshot |
BIGINT |
|
end_snapshot |
BIGINT |
|
file_order |
BIGINT |
|
path |
VARCHAR |
|
path_is_relative |
BOOLEAN |
|
file_format |
VARCHAR |
|
record_count |
BIGINT |
|
file_size_bytes |
BIGINT |
|
footer_size |
BIGINT |
|
row_id_start |
BIGINT |
|
partition_id |
BIGINT |
|
encryption_key |
VARCHAR |
|
partial_file_info |
VARCHAR |
|
mapping_id |
BIGINT |
data_file_idis the numeric identifier of the file. It is a primary key.data_file_idis incremented fromnext_file_idin theducklake_snapshottable.table_idrefers to atable_idfrom theducklake_tabletable.begin_snapshotrefers to asnapshot_idfrom theducklake_snapshottable. The file is part of the table starting with this snapshot id.end_snapshotrefers to asnapshot_idfrom theducklake_snapshottable. The file is part of the table up to but not including this snapshot id. Ifend_snapshotisNULL, the file is currently part of the table.file_orderis a number that defines the vertical position of the file in the table. It needs to be unique within a snapshot but does not have to be contiguous (gaps are ok).pathis the file path of the data file, e.g.,my_file.parquetfor a relative path.path_is_relativewhether thepathis relative to thepathof the table (true) or an absolute path (false).file_formatis the storage format of the file. Currently, onlyparquetis allowed.record_countis the number of records (row) in the file.file_size_bytesis the size of the file in bytes.footer_sizeis the size of the file metadata footer, in the case of Parquet the Thrift data. This is an optimization that allows for faster reading of the file.row_id_startis the first logical row id in the file. (Every row has a unique row id that is maintained.)partition_idrefers to apartition_idfrom theducklake_partition_infotable.encryption_keycontains the encryption for the file if encryption is enabled.partial_file_infois used when snapshots refer to parts of a file.mapping_idrefers to amapping_idfrom theducklake_column_mappingtable.