29.1. Adding Parquet Files to an Analytical Table (metastore.add_files) #

Required privileges:

  • INSERT privilege on the analytical table.

  • SELECT privilege on the shared directory if Parquet files are added from this directory.

For more information about stored procedures and privileges, refer to Section 22.1.

Execute the following command:

  SELECT metastore.add_files('table_name', 'path_to_Parquet_files', 'path_to_JSON');

Where:

  • table_name: The name of the analytical table to which Parquet files are added.

  • path_to_Parquet_files: The path to Parquet files that are added to the analytical table.

    Possible values:

    • A path to any storage directory starting with the storage prefix, such as file:///tmp/my_data/ for a local storage or s3://bucket/path/ for an S3 storage.

    • A path within a shared directory from the pga_folder metadata table, starting with the directory name.

    For multiple Parquet files, place all files in one directory and specify the directory path ending with /. The directory must contain only the Parquet files being added.

    For a single Parquet file, specify the full path ending with the filename.

  • path_to_JSON: The path to a JSON file with Parquet file storage parameters.

    These parameters apply when creating new Parquet files. In the metastore.add_files stored procedure, parameters are ignored for non-partitioned tables since Parquet files are added as is but apply for partitioned tables where Parquet files are split into multiple files. In the metastore.copy_table stored procedure, parameters always apply because new Parquet files are created from the SQL command results.

    For more information about partitioning, refer to Chapter 30.

    Optional parameter.

Postgres Pro AXE performs the following actions:

  1. Verifies input parameters and user privileges.

  2. Ensures metadata compatibility between Parquet files and the analytical table: the number, order, names, and types of columns must match.

  3. Creates new entries in pga_snapshot and pga_data_file metadata tables.

  4. Copies Parquet files to the storage directory of the analytical table, to a new subdirectory with the snapshot ID as the name.

    If Parquet files are added to a partitioned analytical table, they are split into multiple files based on partition columns, and a directory tree is created for these files.

  5. Updates statistics in pga_table_stats, pga_table_column_stats, and pga_file_column_statistics metadata tables.

Example 29.1. Executing the metastore.add_files stored procedure

  SELECT metastore.add_files('table_example', 'folder/file.parquet', 'folder/options.json');