5.4. Configuring Objects Visibility #

You can execute transactional ETL operations on pgpro_metastore objects. Similarly to Postgres Pro, snapshots are used to isolate procedures. Snapshots are monotonically increasing sequences of numbers. Each object has a snapshot number associated with its creation and deletion.

ETL operations use snapshot numbers to determine the visibility of pgpro_metastore objects. By default, the maximum snapshot number is used to read the latest state of pgpro_metastore. For example, you can request a list of Parquet files of an analytical table by executing the following query:

Example 5.2. 

  SELECT data.path AS data_file_path
  FROM ducklake_data_file AS data
  WHERE
      data.table_id = TABLE_ID AND
      SNAPSHOT_ID >= data.begin_snapshot AND
      (SNAPSHOT_ID < data.end_snapshot OR data.end_snapshot IS NULL)
  ORDER BY file_order;

If an ETL operation adds or deletes a pgpro_metastore object, it creates a new snapshot and specifies the snapshot number for the begin_snapshot or end_snapshot value of the object depending on the operation type.

While metadata tables are stored on the pgpro_metastore catalog server, snapshots are not associated with Postgres Pro transaction snapshots. This allows executing ETL operations even if the pgpro_metastore catalog server is temporarily unavailable after reloading. The downside of the separate snapshots implementation is that it requires a separate conflict resolution mechanism when two snapshots modify the same objects.

pgpro_metastore also uses snapshots of the Postgres Pro server where ETL operations are executed. Currently, executing multiple operations in the same transaction block is not supported. Even if source Postgres Pro tables stay consistent during the transaction, they will be added to pgpro_metastore in separate snapshots.

5.4.1. Isolation Guarantees #

In terms of isolation, there are two types of ETL operations:

  • Selecting from analytical tables.

    For these operations, standard Postgres Pro views are used. Currently, reading a view associated with an analytical table returns the latest version of data, even for transactions with the repeatable read isolation level or higher. In this case, only committed OLAP data is visible.

    Storage mechanisms are used to provide isolation.

  • Other ETL operations with pgpro_metastore.

    During the execution of these procedures, read-only queries are sent to pgpro_metastore, and all queries updating metadata are executed at the end in a single serialized transaction. This allows resolving most of the conflicts. The additional triggers logic is used when required.

    For more information, refer to Section 3.2, Section 3.3, Section 3.6, and Section 3.7.

5.4.2. Atomicity Guarantees #

ETL operations also support the atomicity of object updates. If an operation is canceled, all intermediate data is deleted, and updates of metadata tables are not committed.

Even if procedures update pgpro_metastore objects in one transaction, additional service transactions are committed to log data updates. Logs allow canceling updates when the server fails or reloads. You can do this by executing the following query:

  SELECT metastore.cleanup();