20.3. Deleting Parquet Files #

In Postgres Pro AXE, the deletion of Parquet files is a two-stage process. First, you must mark snapshots referenced by entries from the pga_data_file metadata table as expired to exclude associated Parquet files from analytical queries. Then, you can delete Parquet files.

Before performing this instruction, ensure that you are assigned the metastore_admin role. For more information, refer to Chapter 14.

To delete Parquet files:

  1. Mark snapshots as expired in one of the following ways:

    • To mark snapshots created before a certain date and time as expired, execute the following query:

        SELECT metastore.expire_snapshot(threshold_date_and_time);
      

      Where threshold_date_and_time is the threshold value for the snapshot creation date and time.

      The snapshot creation date and time are contained in the snapshot_time column of the pga_snapshot metadata table.

      Supported date and time formats:

      • Date in the yyyy-mm-dd format.

        Example 20.3. 

        2025-11-28


      • Date in the yyyy-mm-dd format and time in the hh:mm:ss format.

        Example 20.4. 

        2025-11-28 12:22:46


      • Date in the yyyy-mm-dd format and time in the hh:mm:ss format with the time zone.

        Example 20.5. 

        2025-11-28 12:22:46+03


      • Date in the yyyy-mm-dd format and time in the hh:mm:ss.ssssss format with time zone, where the fractional part of seconds must have no more than 6 digits.

        Example 20.6. 

        2025-11-28 12:22:46.123456+03


      Example 20.7. 

        SELECT metastore.expire_snapshot('2025-11-13 12:22:46.123456+03');
      

    • To mark snapshots with certain IDs as expired, execute the following query:

        SELECT metastore.expire_snapshot('[list_of_snapshot_IDs]');
      

      Where list_of_snapshot_IDs is a comma-separated list of snapshot IDs.

      Snapshot IDs are contained in the snapshot_id column of the pga_snapshot metadata table.

      Example 20.8. 

        SELECT metastore.expire_snapshot('[1,2,3,4]');
      

    Once the query is executed, pgpro_metastore performs the following actions:

    1. Marks snapshots as expired if their entries from the pga_snapshot metadata table have snapshot_time values that are less than threshold_date_and_time or have snapshot_id values that are equal to one of the specified snapshot IDs.

    2. Finds entries in the pga_data_file metadata table whose begin_snapshot values reference expired snapshots and creates associated entries in the pga_files_scheduled_for_deletion metadata table if these entities were not already created by previous expire_snapshot calls.

      Note

      Other metadata tables whose begin_snapshot values reference expired snapshots are not impacted.

  2. Recreate Postgres Pro views from analytical tables associated with Parquet files so that these files are excluded from analytical queries.

    Important

    Postgres Pro views can be recreated only during periods of zero user activity to prevent potential data loss.

  3. Delete Parquet files:

    SELECT metastore.delete_expired_files();
    

    Once the query is executed, pgpro_metastore performs the following actions:

    1. Deletes Parquet files from the storage and associated entries from pga_files_scheduled_for_deletion, pga_data_file, and pga_file_column_statistics metadata tables.

    2. Recalculates statistics for entries associated with deleted Parquet files in pga_table_stats and pga_table_column_stats metadata tables.

    3. Recalculates column_order values for entries associated with deleted Parquet files in the pga_column metadata table.