28.5. Retrieving Filtered Parquet Files #

You can retrieve Parquet files after retrieving columns of an analytical table. Retrieved Parquet can be filtered using statistics from the pga_file_column_statistics metadata table.

To retrieve Parquet files and filter them by column values, execute the following query:

SELECT data_file_id
FROM ducklake_file_column_stats
WHERE
    table_id = table_ID AND
    column_id = column_ID AND
    (SCALAR >= min_value OR min_value IS NULL) AND
    (SCALAR <= max_value OR max_value IS NULL);

Where:

  • table_ID: The ID of the analytical table from the pga_table metadata table associated with Parquet files.

  • column_ID: The ID of the column from the pga_column metadata table whose values are used for filtering Parquet files.

    In this example, only Parquet files that do not contain scalar values in the column_ID column are retrieved.

You can filter column values using different conditions, such as "greater than (>)", by updating the query accordingly.

The minimum and maximum values of each column are stored as arrays and must be converted to integers.