34.1. Architecture #

Postgres Pro is traditionally used for transactional (OLTP) workloads. It determines architectural decisions on data layout and processing algorithms. OLTP workloads have the following characteristics:

  • Operate on small volumes of data

  • Perform read/write operations on most columns

  • Require strict ACID guarantees

  • Require short response time

With these characteristics, an optimal approach is to store and process data in a row-based format.

Analytical (OLAP) workloads are significantly different:

  • Process large volumes of data

  • Perform read-only operations on a small subset of columns

  • Do not require strict ACID guarantees

  • Prioritize throughput over latency

Considering the difference, specialized systems that manage OLAP workloads hold a significant advantage due to their architecture:

  • Network or object-based storage for better scalability and price-to-performance ratio

  • Columnar layout on storage and in memory to work efficiently with individual columns and for higher level of compression

  • Vectorized execution to process data in batches

34.1.1. Data Export and Import #

Operation of the pgpro_duckdb extension is divided into the following parts:

  • Data export and ETL processes

  • Data import and analytical queries execution

With the pgpro_duckdb extension, you can export data and build ETL processes using Postgres Pro tables as a source. This extension can also be used to process files from external systems. Intermediate data can be stored in temporary tables in the duckdb columnar format. All required transformations are made within a standard Postgres Pro session using SQL queries.

34.1.2. File Catalogue and Process Automation #

Automating ETL and maintaining a catalogue for analytical tables are both important parts of an analytical system.

The current version of Postgres Pro Enterprise does not support automatic configuration of the periodic ETL process and implementation of the catalog. To manually configure periodic jobs, use the pgpro_scheduler extension.