DuckLake specifies multiple different data types for field values, and also supports nested types.
The types of columns are defined in the column_type field of the ducklake_column table.
Primitive Types
| Type | Description |
|---|---|
boolean |
True or false |
int8 |
8-bit signed integer |
int16 |
16-bit signed integer |
int32 |
32-bit signed integer |
int64 |
64-bit signed integer |
uint8 |
8-bit unsigned integer |
uint16 |
16-bit unsigned integer |
uint32 |
32-bit unsigned integer |
uint64 |
64-bit unsigned integer |
int128 |
128-bit signed integer |
uint128 |
128-bit unsigned integer |
float32 |
32-bit IEEE 754 floating-point value |
float64 |
64-bit IEEE 754 floating-point value |
decimal(P, S) |
Fixed-point decimal with precision P and scale S |
time |
Time of day, microsecond precision |
timetz |
Time of day, microsecond precision, with time zone |
date |
Calendar date |
timestamp |
Timestamp, microsecond precision |
timestamptz |
Timestamp, microsecond precision, with time zone |
timestamp_s |
Timestamp, second precision |
timestamp_ms |
Timestamp, millisecond precision |
timestamp_ns |
Timestamp, nanosecond precision |
interval |
Time interval in three different granularities: months, days, and milliseconds |
varchar |
Text |
blob |
Binary data |
json |
JSON |
uuid |
Universally unique identifier |
Nested Types
DuckLake supports nested types and primitive types. Nested types are defined recursively using the parent_column field in the ducklake_column table. For example, a column of type INT[] is stored as two rows: a parent column of type list, and a child column of type int32 whose parent_column points to the parent.
The following nested types are supported:
| Type | Description |
|---|---|
list |
Collection of values with a single child type |
struct |
A tuple of typed values |
map |
A collection of key-value pairs |
Semi-Structured Types
| Type | Description |
|---|---|
variant |
A dynamically typed value that can hold any primitive or nested type, stored in a binary encoding. |
The variant type is similar to JSON but is more strongly typed internally, supports a wider range of types (e.g., date, timestamp, decimal), and is stored in a binary-encoded format rather than as a string. Variants are stored in Parquet files according to the Parquet variant encoding specification.
Variants can be shredded into their constituent primitive types when all rows share a consistent schema for a given sub-field. Shredded fields are stored and queried with the same efficiency as native primitive columns. Per-file statistics for shredded sub-fields are recorded in the ducklake_file_variant_stats table.
Note The
varianttype is natively supported in DuckDB. For catalog databases that do not have a native variant type (e.g., PostgreSQL, SQLite), variants cannot yet be stored as inline values in those catalogs.
Geometry Types
DuckLake supports geometry types using the geometry type of the Parquet format. The geometry type can store different types of spatial representations called geometry primitives, of which DuckLake supports the following:
| Geometry primitive | Description |
|---|---|
point |
A single location in coordinate space. |
linestring |
A sequence of points connected by straight line segments. |
polygon |
A planar surface defined by one exterior boundary and zero or more interior boundaries (holes). |
multipoint |
A collection of point geometries. |
multilinestring |
A collection of linestring geometries. |
multipolygon |
A collection of polygon geometries. |
linestring_z |
A linestring geometry with an additional Z (elevation) coordinate for each point. |
geometrycollection |
A heterogeneous collection of geometry primitives (e.g., points, lines, polygons, etc.). |
Type Encoding for Statistics
Statistics values are string-encoded, as they must be stored as VARCHAR, since thatallows storing types that are not native to the catalog database system (e.g., PostgreSQL does not support VARIANT natively). Statistics are stored in both ducklake_file_column_stats and ducklake_table_column_stats.
Most types follow a straightforward encoding, however some do not. The following table describes the encoding of each type:
| Type | Description | Example |
|---|---|---|
boolean |
0 or 1 |
1 |
int8 |
Integer string | 42 |
int16 |
Integer string | 1000 |
int32 |
Integer string | 100000 |
int64 |
Integer string | 100000000 |
uint8 |
Integer string | 255 |
uint16 |
Integer string | 65535 |
uint32 |
Integer string | 4294967295 |
uint64 |
Integer string | 18446744073709551615 |
float32 |
Numeric string with special values: inf, -inf. NaN values are excluded from min/max, the contains_nan flag indicates their presence |
3.14 |
float64 |
Numeric string with special values: inf, -inf. NaN values are excluded from min/max, the contains_nan flag indicates their presence |
3.14159265358979 |
decimal |
Numeric string (independent of precision and scale) | 12345.67 |
date |
ISO 8601 date with special values: infinity, -infinity |
2024-01-15 |
time |
ISO 8601 time with microseconds if non-zero | 12:30:00.123456 |
timestamp |
ISO 8601 timestamp with microseconds if non-zero, with special values: infinity, -infinity |
2024-01-15 12:30:00.123456 |
timestamptz |
ISO 8601 timestamp with UTC offset, with special values: infinity, -infinity |
2024-01-15 12:30:00.123456+00 |
timestamp_s |
ISO 8601 timestamp (second precision) with special values: infinity, -infinity |
2024-01-15 12:30:00 |
timestamp_ms |
ISO 8601 timestamp (millisecond precision) with special values: infinity, -infinity |
2024-01-15 12:30:00.123 |
timestamp_ns |
ISO 8601 timestamp (nanosecond precision) with special values: infinity, -infinity |
2024-01-15 12:30:00.123456789 |
varchar |
As-is | hello |
json |
As-is | {"key": "value"} |
blob |
Hex-encoded string of the raw bytes | 68656C6C6F20776F726C64 |
uuid |
Standard UUID string | 550e8400-e29b-41d4-a716-446655440000 |
The following types do not currently have min/max statistics, as they are not supported by the underlying Parquet format:
int128uint128timetzinterval
Nested Types
Nested types (list, struct, map) do not have min/max statistics themselves, statistics are collected recursively for their child columns instead. For example:
Given the following table:
CREATE TABLE nested_types (
col_list INT[],
col_struct STRUCT(a INT, b VARCHAR),
col_map MAP(VARCHAR, INT)
);
INSERT INTO nested_types VALUES
([1, 2, 3], {'a': 10, 'b': 'hello'}, MAP {'x': 1}),
([4, 5, 6], {'a': 20, 'b': 'world'}, MAP {'y': 2});
The parent columns (col_list, col_struct, col_map) have no min/max statistics. Instead, the child columns store:
| Child column | Type | min | max |
|---|---|---|---|
element |
int32 |
1 |
6 |
a |
int32 |
10 |
20 |
b |
varchar |
hello |
world |
key |
varchar |
x |
y |
value |
int32 |
1 |
2 |
Extra Statistics
The geometry and variant types do not use string-encoded min/max values. Instead, their statistics are stored as JSON in the extra_stats column of the statistics tables.
Geometry
Geometry statistics contain a bounding box (bbox) and a list of geometry primitive types present in the column (types). For example:
{
"bbox": {
"xmin": 0.000000,
"xmax": 5.000000,
"ymin": 0.000000,
"ymax": 5.000000,
"zmin": null,
"zmax": null,
"mmin": null,
"mmax": null
},
"types": ["point"]
}
Variant
Variant statistics contain a JSON array with one entry per shredded sub-field. Each entry records the field path, its shredded type, and min/max statistics for that sub-field. If the variant cannot be shredded (i.e., has inconsistent types across rows), extra_stats is NULL. An example of a Variant extra_stats:
[
{
"field_name": "root",
"shredded_type": "int32",
"null_count": 0,
"min": "42",
"max": "99",
"num_values": 3,
"column_size_bytes": 37,
"any_valid": true
}
]
Type Encoding for Data Inlining
When storing columns for data inlining, a table is created in the catalog to store small inserts and updates in the database. Since DuckDB supports all DuckDB types, inlining types in a DuckDB catalog is straightforward. However, other catalogs might require encodings for non-native types.
PostgreSQL
PostgreSQL does not natively support all DuckDB types. The following table describes how each DuckDB type is mapped to a PostgreSQL type for inlined data storage. Types not listed below (e.g., BOOLEAN, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, TIME, TIMETZ, INTERVAL, UUID) are stored using their native PostgreSQL equivalents.
| DuckDB Type | PostgreSQL Type | Encoding | Example |
|---|---|---|---|
TINYINT |
SMALLINT |
Direct cast | 42 -> 42 |
UTINYINT |
INTEGER |
Direct cast | 200 -> 200 |
USMALLINT |
INTEGER |
Direct cast | 50000 -> 50000 |
UINTEGER |
BIGINT |
Direct cast | 3000000000 -> 3000000000 |
UBIGINT |
VARCHAR |
Stored as text | 18446744073709551615 -> '18446744073709551615' |
HUGEINT |
VARCHAR |
Stored as text | -170141183460469231731687303715884105728 -> '-170141183460469231731687303715884105728' |
UHUGEINT |
VARCHAR |
Stored as text | 340282366920938463463374607431768211455 -> '340282366920938463463374607431768211455' |
VARCHAR |
BYTEA |
Text reinterpreted as bytes. PostgreSQL cannot store null bytes in VARCHAR/TEXT |
'hello world' -> \x68656c6c6f20776f726c64 |
BLOB |
BYTEA |
Binary data stored directly as BYTEA |
'\x00\xFF'::BLOB -> \x00ff |
JSON |
BYTEA |
Text reinterpreted as bytes. DuckDB's JSON type is internally VARCHAR, so it follows the same as VARCHAR to BYTEA |
'{"key": "value"}' -> \x7b226b6579223a202276616c7565227d |
DATE |
VARCHAR |
Stored as text | 2024-01-15 -> '2024-01-15' |
TIMESTAMP |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123456 -> '2024-01-15 12:30:00.123456' |
TIMESTAMPTZ |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123456+00 -> '2024-01-15 12:30:00.123456+00' |
TIMESTAMP_S |
VARCHAR |
Stored as text | 2024-01-15 12:30:00 -> '2024-01-15 12:30:00' |
TIMESTAMP_MS |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123 -> '2024-01-15 12:30:00.123' |
TIMESTAMP_NS |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123456789 -> '2024-01-15 12:30:00.123456789' |
| Nested types | VARCHAR |
Stored as text, similar to stats | [1, 2, 3] -> '[1, 2, 3]' |
SQLite
SQLite has a flexible type system with only five storage classes: INTEGER, REAL, TEXT, BLOB, and NULL. DuckDB's SQLite scanner maps all integer types to BIGINT (i.e., SQLite's INTEGER type) and most other types to VARCHAR (SQLite's TEXT type). Only BLOB is stored as BLOB. The following table describes the non-trivial type mappings:
| DuckDB Type | SQLite Type | Encoding | Example |
|---|---|---|---|
BOOLEAN |
BIGINT |
Stored as integer (0/1) |
true -> 1 |
TINYINT, SMALLINT, INTEGER, BIGINT |
BIGINT |
All integer types use SQLite's INTEGER affinity |
42 -> 42 |
UTINYINT, USMALLINT, UINTEGER |
BIGINT |
Unsigned integers fit within BIGINT range |
200 -> 200 |
UBIGINT |
VARCHAR |
Stored as text | 18446744073709551615 -> '18446744073709551615' |
HUGEINT |
VARCHAR |
Stored as text | -170141183460469231731687303715884105728 -> '-170141183460469231731687303715884105728' |
UHUGEINT |
VARCHAR |
Stored as text | 340282366920938463463374607431768211455 -> '340282366920938463463374607431768211455' |
FLOAT |
VARCHAR |
Stored as text | 3.14 -> '3.14' |
DOUBLE |
VARCHAR |
Stored as text | 3.14159265358979 -> '3.14159265358979' |
DECIMAL |
VARCHAR |
Stored as text | 12345.67 -> '12345.67' |
UUID |
VARCHAR |
Stored as text | 550e8400-... -> '550e8400-...' |
DATE, TIME, TIMETZ |
VARCHAR |
Stored as text | 2024-01-15 -> '2024-01-15' |
TIMESTAMP, TIMESTAMPTZ |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123456 -> '2024-01-15 12:30:00.123456' |
TIMESTAMP_S, TIMESTAMP_MS, TIMESTAMP_NS |
VARCHAR |
Stored as text | 2024-01-15 12:30:00.123 -> '2024-01-15 12:30:00.123' |
INTERVAL |
VARCHAR |
Stored as text | 1 year 2 months 3 days -> '1 year 2 months 3 days' |
| Nested types | VARCHAR |
Stored as text, similar to stats | [1, 2, 3] -> '[1, 2, 3]' |