Appendix I. Glossary

Analytical schema

A separate namespace for the metadata of analytical tables. Analytical schemas are similar to Postgres Pro schemas.

The metadata of analytical schemas is stored in the pga_schema metadata table.

For more information about working with analytical schemas, refer to Section 3.5.

Analytical table

A set of columns and rows with the OLAP data featuring a full history of data and table type updates.

Rows of analytical tables are stored as Parquet files in a storage. The metadata of analytical tables is stored in the pga_table metadata table.

For more information about working with analytical tables, refer to Section 3.3.

Analytical table type

An ordered set of analytical table columns consisting of their names, types, and constraints. The analytical table type is updated when working with columns.

Analytical table view

A Postgres Pro view that provides users with a set of analytical table columns and rows. Views use the metadata of analytical tables from the pgpro_metastore catalog to effectively execute analytical queries with partition pruning and predicate pushdown.

For more information about creating analytical table views, refer to Section 3.3.3.

Extract, Transform, Load (ETL) operation

An operation that extracts, transforms, or loads the OLAP data of analytical tables. Currently, pgpro_metastore supports the following ETL operations:

Heap table

A standard Postgres Pro table.

Parquet

An open-source binary file format designed for storing and processing large amounts of data. The data is organized in columns rather than rows. This columnar structure allows executing analytical queries faster because only the required columns are scanned, reducing the amount of processed data. The Parquet format also supports compression and encoding, such as column compression, dictionary encoding, and run-length encoding (RLE) to decrease the file size.

pgpro_metastore stores the OLAP data as Parquet files.

The information about Parquet files is stored in pga_data_file, pga_files_scheduled_for_deletion, and pga_file_partition_value metadata tables.

For more information about the Parquet format, refer to https://parquet.apache.org.

Partition

A group of Parquet files created according to partitioning criteria, such as specific column values or ranges of column values. Each analytical table has at least one partition. Additional partitions can be performed in one of the following ways:

  • automatically as the size limit for Parquet files in a partition is reached

  • according to partitioning criteria

Partitioning

The process of distributing the OLAP data of analytical tables between Parquet files based on column values. Partitioning is performed in a way that optimizes the execution time of queries to analytical tables and allows using predicate pushdown to exclude from scanning Parquet files that do not meet query conditions. Analytical tables can be repartitioned at any time.

Currently, only hive partitioning provided by DuckDB is supported.

pgpro_metastore catalog

The axe_catalog schema where metadata tables are stored. It can be created on the pgpro_metastore server or on a separate server.

pgpro_metastore objects

Main pgpro_metastore entities to which access privileges can be granted:

Shared directory

A storage directory where Parquet files are located when adding and exporting the OLAP data from analytical tables.

The metadata of shared directories is stored in the pga_folder metadata table.

For more information about shared directories, refer to Section 3.6.

Snapshot

An entity used for supporting the temporality of pgpro_metastore by keeping the history of OLAP data and metadata updates. All user actions result in snapshot creation, and each snapshot is associated with a single update.

A specific instance of snapshot usage is snapshots of an analytical table. They reflect the state of the analytical table, including its OLAP data, name, and type at a particular moment in time. Analytical tables are updated in transactions, and a separate snapshot is created for each update. You can restore analytical tables to any state provided that the OLAP data of the analytical table was not deleted using the expire_snapshot ETL operation.

The metadata of snapshots is stored in the pga_snapshot metadata table.

For snapshot usage examples, refer to Appendix H.

Storage

Physical location of Parquet files and shared directories. Postgres Pro AXE supports local, network, and S3 storages.

S3 (Simple Storage Service) storages are cloud storages for any type of data that can be accessed via API. The data is stored as objects inside buckets (containers) with unique IDs and the metadata, such as type, creation date and time, as well as access privileges. The main advantages of such storages are their scalability, flexibility, and accessibility from anywhere using the internet.

The metadata of storages is stored in the pga_storage metadata table.

For more information about working with storages, refer to Section 3.2.

Universal Resource Identifier (URI)

A connection string for the storage that contains information required to initialize the data storage layer. It consists of the connection prefix (i.e., 'file://', 's3://'), network address, port number, and path. In addition, the S3 bucket name, storage region, and protocol can be specified.