5.3. Reading Metadata #

5.3.1. Retrieving a Snapshot #

Execute the following query:

SELECT *
FROM pga_snapshot
WHERE snapshot_id = snapshot_ID;

Where snapshot_ID is the ID of the snapshot from the pga_snapshot metadata table.

To retrieve the latest snapshot, execute the following query:

SELECT *
FROM pga_snapshot
WHERE snapshot_id =
  (SELECT max(snapshot_id) FROM pga_snapshot);

5.3.2. Retrieving Analytical Schemas #

Execute the following query:

SELECT schema_id, schema_name
FROM pga_schema
WHERE
    snapshot_ID >= begin_snapshot AND
    (snapshot_ID < end_snapshot OR end_snapshot IS NULL);

Where snapshot_ID is the ID of the snapshot from the pga_snapshot metadata table.

5.3.3. Retrieving Analytical Tables #

Execute the following query:

SELECT table_id, table_name
FROM pga_table
WHERE
    schema_id = schema_ID AND
    snapshot_ID >= begin_snapshot AND
    (snapshot_ID < end_snapshot OR end_snapshot IS NULL);

Where:

5.3.4. Retrieving Columns of an Analytical Table #

Execute the following query:

SELECT column_id, column_name, column_type
FROM pga_column
WHERE
    table_id = table_ID AND
    snapshot_ID >= begin_snapshot AND
    (snapshot_ID < end_snapshot OR end_snapshot IS NULL)
ORDER BY column_order;

Where:

5.3.5. Retrieving Filtered Parquet Files #

You can retrieve Parquet files after retrieving columns of an analytical table. Retrieved Parquet can be filtered using statistics from the pga_file_column_statistics metadata table.

To retrieve Parquet files and filter them by column values, execute the following query:

SELECT data_file_id
FROM ducklake_file_column_stats
WHERE
    table_id = table_ID AND
    column_id = column_ID AND
    (SCALAR >= min_value OR min_value IS NULL) AND
    (SCALAR <= max_value OR max_value IS NULL);

Where:

  • table_ID: The ID of the analytical table from the pga_table metadata table associated with Parquet files.

  • column_ID: The ID of the column from the pga_column metadata table whose values are used for filtering Parquet files.

    In this example, only Parquet files that do not contain scalar values in the column_ID column are retrieved.

You can filter column values using different conditions, such as "greater than (>)", by updating the query accordingly.

The minimum and maximum values of each column are stored as arrays and must be converted to integers.