21.1. Functions for Reading OLAP Data from a Storage #
-
read_parquet(path TEXT or TEXT[],#optional_parameters) returns SETOFduckdb.row Reads a Parquet file from a storage.
Returns DuckDB rows. You can expand these rows using
*or select specific columns using ther['column_name']syntax. To select specific columns, give the function call an easy alias, such asr:SELECT * FROM read_parquet('file.parquet'); SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21; SELECT COUNT(*) FROM read_parquet('file.parquet');Required parameters:
Name
Type
Description
pathtextortext[]The path to the storage with Parquet files to read. It can be a glob or array of files.
Optional parameters mirror the DuckDB
read_parquetfunction. To specify optional parameters, useparameter := 'value'.-
read_csv(path TEXT or TEXT[],#optional_parameters) returns SETOFduckdb.row Reads a CSV file from a storage.
Returns DuckDB rows. You can expand these rows using
*or select specific columns using ther['column_name']syntax. To select specific columns, give the function call an easy alias, such asr:SELECT * FROM read_csv('file.csv'); SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21; SELECT COUNT(*) FROM read_csv('file.csv');Required parameters:
Name
Type
Description
pathtextortext[]The path to the storage with CSV files to read. It can be a glob or array of files.
Optional parameters mirror the DuckDB
read_csvfunction. To specify optional parameters, useparameter := 'value'.Note
columnsis not currently supportednullstrmust be an array (TEXT[])
-
read_json(path TEXT or TEXT[],#optional_parameters) returns SETOFduckdb.row Reads a JSON file from a storage.
Returns DuckDB rows. You can expand these rows using
*or select specific columns using ther['column_name']syntax. To select specific columns, give the function call an easy alias, such asr:Example 21.1.
SELECT * FROM read_parquet('file.parquet'); SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21; SELECT COUNT(*) FROM read_parquet('file.parquet');Required parameters:
Name
Type
Description
pathtextortext[]The path to the storage with JSON files to read. It can be a glob or array of files.
Optional parameters mirror the DuckDB
read_jsonfunction. To specify optional parameters, useparameter := 'value'.Note
columnsis not currently supported.-
iceberg_scan(path TEXT,#optional_parameters) returns SETOFduckdb.row Reads an iceberg table from a storage.
Returns DuckDB rows. You can expand these rows using
*or select specific columns using ther['column_name']syntax. To select specific columns, give the function call an easy alias, such asr:Example 21.2.
SELECT * FROM iceberg_scan('data/iceberg/table'); SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21; SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');To use
iceberg_scan, enable the iceberg extension:SELECT duckdb.install_extension('iceberg');Required parameters:
Name
Type
Description
pathtextThe path to the storage with the iceberg table to read.
Optional parameters mirror the DuckDB
iceberg_scanfunction. To specify optional parameters, useparameter := 'value'.Optional parameters:
Name
Type
Default value
Description
allowed_moved_pathsbooleanfalse
Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.
modetext''
—
metadata_compression_codectext'none'—
skip_schema_inferencebooleanfalse
—
versiontext'version-hint.text'—
version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'—
-
iceberg_metadata(path TEXT,#optional_parameters) returns SETOFiceberg_metadata_record Returns metadata of an iceberg table. Metadata is returned as a set of
icerberg_metadata_record, which is defined as follows:CREATE TYPE duckdb.iceberg_metadata_record AS ( manifest_path TEXT, manifest_sequence_number NUMERIC, manifest_content TEXT, status TEXT, content TEXT, file_path TEXT );To use
iceberg_metadata, enable the iceberg extension:SELECT duckdb.install_extension('iceberg');Required parameters:
Name
Type
Description
pathtextThe path to the storage with the iceberg table to read.
Optional parameters mirror the DuckDB
iceberg_metadatafunction. To specify optional parameters, useparameter := 'value'.Optional parameters:
Name
Type
Default value
Description
allowed_moved_pathsbooleanfalse
Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.
metadata_compression_codectext'none'—
skip_schema_inferencebooleanfalse
—
versiontext'version-hint.text'—
version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'-
iceberg_snapshots (path TEXT,#optional_parameters) returns SETOFiceberg_snapshot_record Reads snapshot information from an iceberg table.
Returns snapshot metadata for an iceberg table, which can be useful for "time travel" queries and understanding table history.
Example 21.3.
SELECT * FROM iceberg_snapshots('data/iceberg/table');To use
iceberg_snapshots, enable the iceberg extension:SELECT duckdb.install_extension('iceberg');Required parameters:
Name
Type
Description
pathtextThe path to the storage with the iceberg table to read.
Optional parameters mirror the DuckDB
iceberg_snapshotsfunction. To specify optional parameters, useparameter := 'value'.Optional parameters:
Name
Type
Default value
metadata_compression_codectext'none'skip_schema_inferencebooleanfalse
versiontext'version-hint.text'version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'-
delta_scan(path TEXT) returns SETOF#duckdb.row Reads a Delta dataset from a storage.
Returns DuckDB rows. You can expand these rows using
*or select specific columns using ther['column_name']syntax. To select specific columns, give the function call an easy alias, such asr:Example 21.4.
SELECT * FROM delta_scan('/path/to/delta/dataset'); SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21; SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');To use
delta_scan, enable the delta extension:SELECT duckdb.install_extension('delta');Required parameters:
Name
Type
Description
pathtextThe path to the storage with the Delta dataset to read.