28.5. Retrieving Filtered Parquet Files #
You can retrieve Parquet files after retrieving columns of an analytical table. Retrieved Parquet can be filtered using statistics from the pga_file_column_statistics metadata table.
To retrieve Parquet files and filter them by column values, execute the following query:
SELECT data_file_id
FROM ducklake_file_column_stats
WHERE
table_id = table_ID AND
column_id = column_ID AND
(SCALAR >= min_value OR min_value IS NULL) AND
(SCALAR <= max_value OR max_value IS NULL);
Where:
table_ID: The ID of the analytical table from thepga_tablemetadata table associated with Parquet files.column_ID: The ID of the column from thepga_columnmetadata table whose values are used for filtering Parquet files.In this example, only Parquet files that do not contain scalar values in the
column_IDcolumn are retrieved.
You can filter column values using different conditions, such as "greater than (>)", by updating the query accordingly.
The minimum and maximum values of each column are stored as arrays and must be converted to integers.