21.1. Functions for Reading OLAP Data from a Storage #

read_parquet(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a Parquet file from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

    SELECT * FROM read_parquet('file.parquet');
    SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
    SELECT COUNT(*) FROM read_parquet('file.parquet');
  

Required parameters:

Name

Type

Description

path

text or text[]

The path to the storage with Parquet files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_parquet function. To specify optional parameters, use parameter := 'value'.

read_csv(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a CSV file from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

  SELECT * FROM read_csv('file.csv');
  SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM read_csv('file.csv');
  

Required parameters:

Name

Type

Description

path

text or text[]

The path to the storage with CSV files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_csv function. To specify optional parameters, use parameter := 'value'.

Note

  • columns is not currently supported

  • nullstr must be an array (TEXT[])

read_json(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a JSON file from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

Example 21.1. 

    SELECT * FROM read_parquet('file.parquet');
    SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
    SELECT COUNT(*) FROM read_parquet('file.parquet');
  

Required parameters:

Name

Type

Description

path

text or text[]

The path to the storage with JSON files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_json function. To specify optional parameters, use parameter := 'value'.

Note

columns is not currently supported.

iceberg_scan(path TEXT, optional_parameters) returns SETOF duckdb.row #

Reads an iceberg table from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

Example 21.2. 

  SELECT * FROM iceberg_scan('data/iceberg/table');
  SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');
  

To use iceberg_scan, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');
  

Required parameters:

Name

Type

Description

path

text

The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_scan function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name

Type

Default value

Description

allowed_moved_paths

boolean

false

Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.

mode

text

''

metadata_compression_codec

text

'none'

skip_schema_inference

boolean

false

version

text

'version-hint.text'

version_name_format

text

'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_metadata(path TEXT, optional_parameters) returns SETOF iceberg_metadata_record #

Returns metadata of an iceberg table. Metadata is returned as a set of icerberg_metadata_record, which is defined as follows:

  CREATE TYPE duckdb.iceberg_metadata_record AS (
    manifest_path TEXT,
    manifest_sequence_number NUMERIC,
    manifest_content TEXT,
    status TEXT,
    content TEXT,
    file_path TEXT
  );
  

To use iceberg_metadata, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');
  

Required parameters:

Name

Type

Description

path

text

The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_metadata function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name

Type

Default value

Description

allowed_moved_paths

boolean

false

Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.

metadata_compression_codec

text

'none'

skip_schema_inference

boolean

false

version

text

'version-hint.text'

version_name_format

text

'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_snapshots (path TEXT, optional_parameters) returns SETOF iceberg_snapshot_record #

Reads snapshot information from an iceberg table.

Returns snapshot metadata for an iceberg table, which can be useful for "time travel" queries and understanding table history.

Example 21.3. 

  SELECT * FROM iceberg_snapshots('data/iceberg/table');
  

To use iceberg_snapshots, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');
  

Required parameters:

Name

Type

Description

path

text

The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_snapshots function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name

Type

Default value

metadata_compression_codec

text

'none'

skip_schema_inference

boolean

false

version

text

'version-hint.text'

version_name_format

text

'v%s%s.metadata.json,%s%s.metadata.json'

delta_scan(path TEXT) returns SETOF duckdb.row #

Reads a Delta dataset from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

Example 21.4. 

  SELECT * FROM delta_scan('/path/to/delta/dataset');
  SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');
  

To use delta_scan, enable the delta extension:

  SELECT duckdb.install_extension('delta');
  

Required parameters:

Name

Type

Description

path

text

The path to the storage with the Delta dataset to read.