29.1. Adding Parquet Files to an Analytical Table (metastore.add_files) #
Required privileges:
INSERTprivilege on the analytical table.SELECTprivilege on the shared directory if Parquet files are added from this directory.
For more information about stored procedures and privileges, refer to Section 22.1.
Execute the following command:
SELECT metastore.add_files('table_name', 'path_to_Parquet_files', 'path_to_JSON');
Where:
table_name: The name of the analytical table to which Parquet files are added.path_to_Parquet_files: The path to Parquet files that are added to the analytical table.Possible values:
A path to any storage directory starting with the storage prefix, such as
file:///tmp/my_data/for a local storage ors3://bucket/path/for an S3 storage.A path within a shared directory from the
pga_foldermetadata table, starting with the directory name.
For multiple Parquet files, place all files in one directory and specify the directory path ending with
/. The directory must contain only the Parquet files being added.For a single Parquet file, specify the full path ending with the filename.
path_to_JSON: The path to a JSON file with Parquet file storage parameters.These parameters apply when creating new Parquet files. In the
metastore.add_filesstored procedure, parameters are ignored for non-partitioned tables since Parquet files are added as is but apply for partitioned tables where Parquet files are split into multiple files. In themetastore.copy_tablestored procedure, parameters always apply because new Parquet files are created from the SQL command results.For more information about partitioning, refer to Chapter 30.
Optional parameter.
Postgres Pro AXE performs the following actions:
Verifies input parameters and user privileges.
Ensures metadata compatibility between Parquet files and the analytical table: the number, order, names, and types of columns must match.
Creates new entries in
pga_snapshotandpga_data_filemetadata tables.Copies Parquet files to the storage directory of the analytical table, to a new subdirectory with the snapshot ID as the name.
If Parquet files are added to a partitioned analytical table, they are split into multiple files based on partition columns, and a directory tree is created for these files.
Updates statistics in
pga_table_stats,pga_table_column_stats, andpga_file_column_statisticsmetadata tables.
Example 29.1. Executing the metastore.add_files stored procedure
SELECT metastore.add_files('table_example', 'folder/file.parquet', 'folder/options.json');