5.2. Metadata Tables #
- 5.2.1.
pga_snapshotMetadata Table- 5.2.2.
pga_snapshot_changesMetadata Table- 5.2.3.
pga_schemaMetadata Table- 5.2.4.
pga_tableMetadata Table- 5.2.5.
pga_storageMetadata Table- 5.2.6.
pga_uriMetadata Table- 5.2.7.
pga_folderMetadata Table- 5.2.8.
pga_columnMetadata Table- 5.2.9.
pga_data_fileMetadata Table- 5.2.10.
pga_files_scheduled_for_deletionMetadata Table- 5.2.11.
pga_table_statsMetadata Table- 5.2.12.
pga_table_column_statsMetadata Table- 5.2.13.
pga_file_column_statisticsMetadata Table- 5.2.14.
pga_transaction_logMetadata Table - 5.2.2.
The metadata stored in metadata tables of pgpro_metastore is required for the following functionality:
Creating analytical views.
Querying analytical tables that use partitioning and indexes of Parquet files.
Supporting basic ETL operations:
performing ETL operations
scheduling ETL operations to be performed at a certain date and time, as well as sequentially (one after another)
limiting resources available for ETL operations to avoid impacting the OLTP workload
Working with the OLAP data:
creating Parquet files with the OLAP data from Postgres Pro tables and importing this data to analytical tables
exporting the OLAP data from analytical tables to Parquet, CSV, and JSON files
creating reports with filtered OLAP data using SQL queries
Making transaction updates to pgpro_metastore objects, as well as providing consistent updates of analytical tables when reading and writing to these tables.
Checking data integrity and restoring Parquet files from backups or by using S3 storage mirrors.
Monitoring pgpro_metastore and notifying administrators about events requiring their immediate attention.
Storing Parquet files in separate storages to avoid competition with the OLTP workload for I/O throughput.
The schema of metadata tables is shown in the diagram below.
Figure 5.1. Schema of Metadata Tables
For more information about each metadata table, refer to corresponding subsections of this section.
5.2.1. pga_snapshot Metadata Table #
This metadata table contains information about snapshots.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The snapshot ID. The value is increased by 1 for each created snapshot. This is a primary key referred to by most metadata tables. |
|
|
| The snapshot creation date and time. |
|
|
| The pgpro_metastore catalog version. The value is increased by 1 at each pgpro_metastore object update, for example, when a new analytical table is created. |
|
|
| The ID assigned to the next pgpro_metastore object, such as an analytical table. The value is increased each time an object is created. |
|
|
| The ID assigned to the next created Parquet file. The value is increased each time a file is created. |
|
|
| Specifies whether pgpro_metastore objects referring to the snapshot are visible. Possible values:
When an ETL operation, such as After this, the You can use the |
5.2.2. pga_snapshot_changes Metadata Table #
This metadata table contains information about snapshot changes. You can use it to resolve conflicts between different snapshots.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key, foreign key | The ID of the snapshot from the |
|
| - | A comma-separated list of snapshot changes in the following formats:
|
5.2.3. pga_schema Metadata Table #
This metadata table contains information about analytical schemas.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The analytical schema ID assigned by increasing the |
|
|
| The persistent analytical schema ID stored for compatibility with Postgres Pro schemas. |
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
|
| The unique analytical schema name, for example, |
5.2.4. pga_table Metadata Table #
This metadata table contains information about analytical tables.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The analytical table ID assigned by increasing the |
|
|
| The persistent analytical table ID stored for compatibility with Postgres Pro tables. |
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
|
| The ID of the analytical schema from the |
|
|
| The analytical table name, for example,
If |
|
|
| The path to the shared directory relative to the The absolute path is generated by concatenating the |
|
| Foreign key, | The ID of the storage from the |
5.2.5. pga_storage Metadata Table #
This metadata table contains information about storages.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The storage ID assigned by increasing the |
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
|
| The storage name, for example,
If |
|
|
| The URI of the root storage directory. Local storages have simple URIs, for example, S3 storages have URIs with more complicated structure and parsing, for example, |
|
|
| The URI of the storage directory for temporary files. Local storages have simple URIs, for example, S3 storages have URIs with more complicated structure and parsing, for example, |
|
|
| The storage type. Currently, local, network, and S3 storages are supported. |
|
|
| The minimum free space for the storage, in bytes. It is verified when executing the |
|
| Foreign key, | The ID of the URI from the |
|
| Foreign key, | The ID of the URI from the |
5.2.6. pga_uri Metadata Table #
This metadata table contains system pgpro_metastore information, and it is not intended to be updated by users.
The uri column contains initial serialized URIs associated with uri or tmp_uri values of the pga_storage metadata table.
The path column contains paths to root directories of storages. This is the only relevant column for local storages, and its values usually match uri values, except for prefixes specifying the storage type:
file://for local storagess3://for S3 storages
Remaining columns contain arbitrary URIs for S3 storages.
For more information, refer to RFS 3968 and to the official AWS documentation.
5.2.7. pga_folder Metadata Table #
This metadata table contains information about shared directories used for storing Parquet files. For example, you can use these directories as a source of Parquet files for the add_files ETL operation.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The shared directory ID assigned by increasing the |
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
|
| The path to the shared directory relative to the root storage directory. The absolute path is generated by concatenating the |
|
|
| The shared directory name, for example,
If |
|
| Foreign key, | The ID of the storage from the |
5.2.8. pga_column Metadata Table #
This metadata table contains information about columns of analytical tables.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The column ID. If the initial Parquet file contains the
|
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
| Foreign key, | The ID of the analytical table from the |
|
|
| The sequence number of the column.
|
|
|
| The column name, for example,
If |
|
|
| The column type corresponding to one of the supported pgpro_metastore data types. |
|
| - | The initial column value after its creation, for example, after executing It can be |
|
| - | The default column value used in It can be |
|
| - | Specifies whether the column can contain Possible values:
|
|
| - | Specifies whether the column can contain repeating values for different rows. Possible values:
|
5.2.9. pga_data_file Metadata Table #
This metadata table contains information about Parquet files.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The Parquet file ID assigned by increasing the |
|
| Foreign key, | The snapshot from the |
|
| Foreign key | The snapshot from the If the value is |
|
| Foreign key, | The ID of the analytical table from the |
|
|
| The vertical position of the Parquet file.
|
|
|
| The path to the Parquet file, for example, |
|
|
| Specifies whether the path to the Parquet file is relative to the path to the analytical table or absolute. Possible values:
Default value: false. |
|
|
| The file format. Currently, only the Parquet format is supported. |
|
|
| The number of rows in the Parquet file. |
|
|
| The Parquet file size, in bytes. |
|
|
| The |
|
|
| The current checksum value of the Parquet file. It is used for data integrity checks. |
5.2.10. pga_files_scheduled_for_deletion Metadata Table #
This metadata table contains information about Parquet files that are not associated with snapshots and are scheduled for deletion. Parquet files are disassociated from snapshots after executing the expire_snapshot ETL operation, which marks snapshots as expired.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key, foreign key | The ID of the Parquet file from the |
|
|
| The path to the Parquet file, for example, |
|
|
| Specifies whether the path to the Parquet file is relative to the path to the analytical table or absolute. Possible values:
Default value: false. |
|
|
| The scheduled Parquet file deletion date and time. |
5.2.11. pga_table_stats Metadata Table #
This metadata table contains statistics of analytical tables.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key, foreign key | The ID of the analytical table from the |
|
|
| The number of rows in the analytical table. |
|
|
| The |
|
|
| The total size of all Parquet files associated with the analytical table, in bytes. |
5.2.12. pga_table_column_stats Metadata Table #
This metadata table contains statistics of analytical table columns.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key, foreign key | The ID of the column from the |
|
|
| Specifies whether the column can contain Possible values:
|
|
| - | Specifies whether the column can contain Possible values:
This is relevant only for |
|
| - | The minimum column value. It is displayed as a string for consistency and must be converted to the current column data type when used. |
|
| - | The maximum column value. It is displayed as a string for consistency and must be converted to the current column data type when used. |
5.2.13. pga_file_column_statistics Metadata Table #
This metadata table contains statistics of analytical table columns in the context of Parquet files.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key, foreign key | The ID of the Parquet file from the |
|
| Primary key, foreign key | The ID of the column from the |
|
|
| The column size, in bytes. |
|
|
| The number of column elements in the Parquet file. This is not the same as the number of Parquet file entries since there can be nested entries, such as structures and lists. |
|
|
| The number of |
|
| - | The minimum column value in the Parquet file. You can use it for estimating the upper bound of the minimum column value. |
|
| - | The maximum column value in the Parquet file. You can use it for estimating the lower bound of the maximum column value. |
5.2.14. pga_transaction_log Metadata Table #
This metadata table contains logging information about pgpro_metastore tasks analogous to standard DBMS transactions.
Column Name | Column Type | Constraint | Description |
|---|---|---|---|
|
| Primary key | The transaction log sequence number (LSN). |
|
|
| The task ID. |
|
| - | The ETL operation ID. One task can consist of multiple ETL operations. |
|
|
| The ETL operation type, such as |
|
| - | The initial and runtime data associated with the task. |