34.1. Architecture #
Postgres Pro is traditionally used for transactional (OLTP) workloads. It determines architectural decisions on data layout and processing algorithms. OLTP workloads have the following characteristics:
Operate on small volumes of data
Perform read/write operations on most columns
Require strict ACID guarantees
Require short response time
With these characteristics, an optimal approach is to store and process data in a row-based format.
Analytical (OLAP) workloads are significantly different:
Process large volumes of data
Perform read-only operations on a small subset of columns
Do not require strict ACID guarantees
Prioritize throughput over latency
Considering the difference, specialized systems that manage OLAP workloads hold a significant advantage due to their architecture:
Network or object-based storage for better scalability and price-to-performance ratio
Columnar layout on storage and in memory to work efficiently with individual columns and for higher level of compression
Vectorized execution to process data in batches
34.1.1. Data Export and Import #
Operation of the pgpro_duckdb extension is divided into the following parts:
Data export and ETL processes
Data import and analytical queries execution
With the pgpro_duckdb extension, you can export data and build ETL processes using Postgres Pro tables as a source. This extension can also be used to process files from external systems. Intermediate data can be stored in temporary tables in the duckdb
columnar format. All required transformations are made within a standard Postgres Pro session using SQL queries.
34.1.2. File Catalogue and Process Automation #
Automating ETL and maintaining a catalogue for analytical tables are both important parts of an analytical system.
The current version of Postgres Pro Enterprise does not support automatic configuration of the periodic ETL process and implementation of the catalog. To manually configure periodic jobs, use the pgpro_scheduler extension.