3.7. OLAP Data #

3.7.1. Adding the OLAP Data to an Analytical Table From Parquet Files #

Execute the following query:

  SELECT metastore.add_files('table_name', 'path_to_Parquet_files');

Where:

  • table_name: The name of the analytical table to which the OLAP data will be added.

  • path_to_Parquet_files: The path to the Parquet file or to the shared directory with Parquet files from which the OLAP data will be added.

    Enter the path relative to the shared directory from the pga_folder metadata table, starting with the name of this directory.

    Metadata of Parquet files and the analytical table must be compatible.

Example 3.31. 

  SELECT add_files('example_table', 'folder/file.parquet');

Once the query is executed, pgpro_metastore performs the following actions:

  1. Verifies user privileges.

  2. Ensures that metadata of Parquet files and the analytical table are compatible.

  3. Copies Parquet files to the shared directory with Parquet files of the analytical table.

  4. Creates a new snapshot in the pga_snapshot metadata table and a new Parquet file in the pga_data_file metadata table.

  5. Updates analytical table and column statistics.

3.7.2. Copying the OLAP Data to an Analytical Table From Other Tables #

Execute the following query:

  SELECT metastore.copy_table('table_name', SQL_query);

Where:

  • table_name: The name of the analytical table to which the OLAP data will be copied.

  • SQL_query: The SQL query that determines which the OLAP data to copy, for example, SELECT * FROM my_pgtable.

    The SQL query must be a single valid SELECT statement. The UNION operation is not supported.

Example 3.32. 

  SELECT copy_table('example_table', $$select * from example_pg_table$$);

Once the query is executed, pgpro_metastore performs the following actions:

  1. Verifies user privileges.

  2. Creates a new snapshot in the pga_snapshot metadata table.

  3. Creates a directory for results of the SQL query that will be saved as Parquet files.

  4. Executes the SQL query and writes its results to Parquet files.

  5. Adds the OLAP data to the analytical table.

3.7.3. Deleting Parquet Files #

It is recommended to delete Parquet files during periods of low user activity, since Parquet files are not only deleted from storages, but also from all associated metadata tables.

To delete Parquet files:

  1. Mark snapshots associated with Parquet files as expired:

    SELECT metastore.expire_snapshot(control_time);
    

    Where control_time is the threshold value for the snapshot creation date and time.

    You can specify this value in one of the following ways:

    • As a date in the yyyy-mm-dd format.

      Example 3.33. 

      2025-11-28


    • As a date in the yyyy-mm-dd format and time in the hh:mm:ss format.

      Example 3.34. 

      2025-11-28 12:22:46


    • As a date in the yyyy-mm-dd format and time in the hh:mm:ss format with time zone.

      Example 3.35. 

      2025-11-28 12:22:46+03


    • As a date in the yyyy-mm-dd format and time in the hh:mm:ss.ssssss format with time zone, where the fractional part of seconds must have no more than 6 digits.

      Example 3.36. 

      2025-11-28 12:22:46.123456+03


    Once the query is executed, pgpro_metastore performs the following actions:

    1. Marks all snapshots whose snapshot_time value is less than control_time as expired.

    2. Finds Parquet files in the pga_data_file metadata table whose begin_snapshot values reference expired snapshots and creates associated Parquet files in the pga_files_scheduled_for_deletion metadata table if they were not created by previous expire_snapshot calls.

    Note

    Expired snapshots currently affect only the pga_data_file metadata table and are not supported for other metadata tables that have the begin_snapshot value.

  2. Delete Parquet files:

    SELECT metastore.delete_expired_files();
    

    Once the query is executed, pgpro_metastore performs the following actions:

    1. Deletes Parquet files from pga_files_scheduled_for_deletion, pga_data_file, and pga_file_column_statistics metadata tables.

    2. Recalculates statistics for analytical tables associated with deleted Parquet files in the pga_table_stats metadata table.

    3. Recalculates statistics for columns of analytical tables associated with deleted Parquet files in the pga_table_column_stats metadata table.

    4. Recalculates the column_order value for other Parquet files.