Data files contain the actual row data.
Column name | Column type | |
---|---|---|
data_file_id |
BIGINT |
Primary Key |
table_id |
BIGINT |
|
begin_snapshot |
BIGINT |
|
end_snapshot |
BIGINT |
|
file_order |
BIGINT |
|
path |
VARCHAR |
|
path_is_relative |
BOOLEAN |
|
file_format |
VARCHAR |
|
record_count |
BIGINT |
|
file_size_bytes |
BIGINT |
|
footer_size |
BIGINT |
|
row_id_start |
BIGINT |
|
partition_id |
BIGINT |
|
encryption_key |
VARCHAR |
|
partial_file_info |
VARCHAR |
data_file_id
is the numeric identifier of the file. It is a primary key.data_file_id
is incremented fromnext_file_id
in theducklake_snapshot
table.table_id
refers to atable_id
from theducklake_table
table.begin_snapshot
refers to asnapshot_id
from theducklake_snapshot
table. The file is part of the table starting with this snapshot id.end_snapshot
refers to asnapshot_id
from theducklake_snapshot
table. The file is part of the table until this snapshot id. Ifend_snapshot
isNULL
, the file is currently part of the table.file_order
is a number that defines the vertical position of the file in the table. it needs to be unique within a snapshot but does not have to be strictly monotonic (holes are ok).path
is the file name of the data file, e.g.my_file.parquet
. The file name is either relative to thedata_path
value inducklake_metadata
or absolute. If relative, thepath_is_relative
field is set totrue
.path_is_relative
defines whether the path is absolute or relative, see above.file_format
is the storage format of the file. Currently, onlyparquet
is allowed.record_count
is the number of records (row) in the file.file_size_bytes
is the size of the file in Bytes.footer_size
is the size of the file metadata footer, in the case of Parquet the Thrift data. This is an optimization that allows for faster reading of the file.row_id_start
is the first logical row id in the file. (Every row has a unique row-id that is maintained.)partition_id
refers to apartition_id
from theducklake_partition_info
table.encryption_key
contains the encryption for the file if encryption is enabled.partial_file_info
is used when snapshots refer to parts of a file.