Chapter 29. Configuring Objects Visibility
Table of Contents
You can execute transactional ETL operations on pgpro_metastore objects. Similarly to Postgres Pro, snapshots are used to isolate procedures. Snapshots are monotonically increasing sequences of numbers. Each object has a snapshot number associated with its creation and deletion.
ETL operations use snapshot numbers to determine the visibility of pgpro_metastore objects. By default, the maximum snapshot number is used to read the latest state of pgpro_metastore. For example, you can request a list of Parquet files of an analytical table by executing the following query:
Example 29.1.
SELECT data.path AS data_file_path
FROM ducklake_data_file AS data
WHERE
data.table_id = TABLE_ID AND
SNAPSHOT_ID >= data.begin_snapshot AND
(SNAPSHOT_ID < data.end_snapshot OR data.end_snapshot IS NULL)
ORDER BY file_order;
If an ETL operation adds or deletes a pgpro_metastore object, it creates a new snapshot and specifies the snapshot number for the begin_snapshot or end_snapshot value of the object depending on the operation type.
While metadata tables are stored on the pgpro_metastore catalog server, snapshots are not associated with Postgres Pro transaction snapshots. This allows executing ETL operations even if the pgpro_metastore catalog server is temporarily unavailable after reloading. The downside of the separate snapshots implementation is that it requires a separate conflict resolution mechanism when two snapshots modify the same objects.
pgpro_metastore also uses snapshots of the Postgres Pro server where ETL operations are executed. Currently, executing multiple operations in the same transaction block is not supported. Even if source Postgres Pro tables stay consistent during the transaction, they will be added to pgpro_metastore in separate snapshots.