34.3. Deployment Recommendations #

The pgpro_duckdb extension can function within any Postgres Pro Enterprise instance and execute analytical queries using different data sources, including Postgres Pro tables, as well as local, network, and S3 storages. This section provides general recommendations for deploying the built-in analytical platform and explains how to distribute OLAP workload across Postgres Pro servers.

For all deployment scenarios, most of OLAP data must be stored as Parquet files in a local, network, or S3 storage. Queries to Postgres Pro heap tables are only used for exporting the initial OLAP data to Parquet files. Heap tables can also be used to fetch recent changes from the original tables that are not yet exported and then to combine them with OLAP data from Parquet files to perform computations on the latest state.

When choosing a deployment scenario, consider the following factors:

  • Impact on OLTP workload from analytical queries:

    • Analytical queries are executed on the primary server.

    • Analytical queries are executed on a synchronous standby server.

    • Analytical queries are executed on an asynchronous standby server.

    • Analytical queries are executed on a standalone Postgres Pro server.

    For more information, see Impact on OLTP Workload from Analytical Queries.

  • Requirements for data security and OLAP workload distribution:

    • Multiple analyst groups share a single Postgres Pro server.

    • Each analyst group uses a dedicated Postgres Pro server.

    For more information, see Requirements for Data Security and OLAP Workload Distribution.

  • Available network, storage, and computational resources:

    • OLAP data is stored locally and can only be accessed by a single Postgres Pro Enterprise instance.

    • OLAP data is stored in a network storage and can be accessed by multiple Postgres Pro Enterprise instances.

    • OLAP data is stored in an S3 storage and can be accessed by multiple Postgres Pro Enterprise instances.

    For more information, see Available Network, Storage, and Computational Resources.

34.3.1. Impact on OLTP Workload from Analytical Queries #

This section describes a deployment scenario with a primary server, a synchronous standby server, and an asynchronous standby server. OLTP workload is processed by the primary server. The scenario does not differentiate between physical and logical replication.

The risk of analytical queries affecting OLTP workload depends on where they are executed:

  • Primary server: The risk is high, since OLTP and OLAP workloads compete for the same computational resources within a single Postgres Pro cluster. In this case, a wrong analytical query may interfere with the processing of OLTP workload.

  • Synchronous standby server: The risk is decreased, but not absent, since the synchronous standby server may still fail due to a wrong analytical query and affect the primary server depending on the server configuration.

  • Asynchronous standby server: The risk is low. In this case, even if the asynchronous standby server fails due to a wrong analytical query, the primary server continues to process OLTP workload.

Recommendations for processing OLAP workload:

  • Use the primary server and synchronous standby servers for processing the stable and verified OLAP workload according to the available computational resources. At the same time, avoid granting direct access rights to analyst groups, as well as executing ad-hoc analytical queries, unverified analytical queries, and analytical queries with unpredictable resource usage.

  • Use asynchronous standby servers for processing most of OLAP workload and testing it before processing on the primary server or on a synchronous standby server. You can use cascading standby servers that do not impact the primary server for interactive analysis and executing ad-hoc analytical queries.

  • Use standalone Postgres Pro servers working with shared network or S3 storages for creating, testing, and executing ad-hoc analytical queries with unpredictable execution time.

34.3.2. Requirements for Data Security and OLAP Workload Distribution #

Currently, the pgpro_duckdb extension uses a basic access model managed by the role specified in the duckdb.postgres_role configuration parameter. Members of this role have full access to pgpro_duckdb functions. To differentiate data access for multiple analyst groups, use external tools. For example, you can configure access on the level of the application that verifies access rights of a particular user and then creates and sends an analytical query to the pgpro_duckdb extension.

To allow all analysts to execute analytical queries and simultaneously grant them different access rights, use dedicated Postgres Pro servers for each analyst group. In this case, access rights to the required data, such as S3 storage data, can be configured on each server with required granularity.

34.3.3. Available Network, Storage, and Computational Resources #

When deploying the built-in analytical platform, distribute OLAP workload across network nodes so that the resulting topology meets performance requirements.

Recommendations for building the topology:

  • Determine the main OLAP processes and applications, along with time limits for performing regular operations.

  • Create a diagram for data flows between storages.

  • Ensure that the network and storages provide sufficient throughput for performing regular operations with a recommended 30% margin.

  • Select the network nodes that will be used for processing most of the OLAP workload. Ensure that these nodes have sufficient computational resources for performing regular operations within the required time limit with a recommended 30% margin.

  • Ensure that there are no bottlenecks. If there are, consider adding computational resources or changing the topology.