5.3. Reading Metadata #
5.3.1. Retrieving a Snapshot #
Execute the following query:
SELECT *
FROM pga_snapshot
WHERE snapshot_id = snapshot_ID;
Where snapshot_ID is the ID of the snapshot from the pga_snapshot metadata table.
To retrieve the latest snapshot, execute the following query:
SELECT * FROM pga_snapshot WHERE snapshot_id = (SELECT max(snapshot_id) FROM pga_snapshot);
5.3.2. Retrieving Analytical Schemas #
Execute the following query:
SELECT schema_id, schema_name
FROM pga_schema
WHERE
snapshot_ID >= begin_snapshot AND
(snapshot_ID < end_snapshot OR end_snapshot IS NULL);
Where snapshot_ID is the ID of the snapshot from the pga_snapshot metadata table.
5.3.3. Retrieving Analytical Tables #
Execute the following query:
SELECT table_id, table_name
FROM pga_table
WHERE
schema_id = schema_ID AND
snapshot_ID >= begin_snapshot AND
(snapshot_ID < end_snapshot OR end_snapshot IS NULL);
Where:
schema_ID: The ID of the analytical schema from thepga_schemametadata table where analytical tables are created.snapshot_ID: The ID of the snapshot from thepga_snapshotmetadata table.
5.3.4. Retrieving Columns of an Analytical Table #
Execute the following query:
SELECT column_id, column_name, column_type
FROM pga_column
WHERE
table_id = table_ID AND
snapshot_ID >= begin_snapshot AND
(snapshot_ID < end_snapshot OR end_snapshot IS NULL)
ORDER BY column_order;
Where:
table_ID: The ID of the analytical table from thepga_tablemetadata table where columns are created.snapshot_ID: The ID of the snapshot from thepga_snapshotmetadata table.
5.3.5. Retrieving Filtered Parquet Files #
You can retrieve Parquet files after retrieving columns of an analytical table. Retrieved Parquet can be filtered using statistics from the pga_file_column_statistics metadata table.
To retrieve Parquet files and filter them by column values, execute the following query:
SELECT data_file_id
FROM ducklake_file_column_stats
WHERE
table_id = table_ID AND
column_id = column_ID AND
(SCALAR >= min_value OR min_value IS NULL) AND
(SCALAR <= max_value OR max_value IS NULL);
Where:
table_ID: The ID of the analytical table from thepga_tablemetadata table associated with Parquet files.column_ID: The ID of the column from thepga_columnmetadata table whose values are used for filtering Parquet files.In this example, only Parquet files that do not contain scalar values in the
column_IDcolumn are retrieved.
You can filter column values using different conditions, such as "greater than (>)", by updating the query accordingly.
The minimum and maximum values of each column are stored as arrays and must be converted to integers.