2.4. Capacity Planning #
When deploying Postgres Pro AXE, sufficient computational resources must be allocated for executing analytical queries. The amount of required resources is determined by the following storage characteristics:
This section provides guidelines for estimating the computational resources required for different OLAP workloads.
After deploying Postgres Pro AXE, monitor the available computational resources and be prepared to allocate additional resources outside the scope of this estimation if required.
2.4.1. Total Amount of OLAP Data in the Storage #
Sufficient space must be allocated in a local, network, or S3 storage for analytical tables. Parquet files are usually stored in a compressed format, such as Snappy or zstd, and constitute approximately 20% of the primary unstructured text data or Postgres Pro table data.
The storage throughput must be sufficient to sustain all read and write analytical queries at peak workloads.
2.4.2. Amount of Data Processed by a Standard Analytical Query #
Certain analytical queries require a large amount of RAM to be executed effectively, particularly:
joining two large analytical tables
aggregations over a large number of unique keys
sorting large analytical tables
The amount of RAM required for a specific OLAP workload depends on the number and size of analytical queries in the Postgres Pro AXE instance. Monitor the amount of available RAM and allocate additional RAM if required.
For the initial deployment, allocate 4 CPU cores per one analyst and approximately 6 GB of RAM per 1 CPU core reserved for executing analytical queries.
Monitor the amount of temporary files created by pgpro_axe every 24 hours and avoid peak values. Temporary files are located in the directory specified by the duckdb.temporary_directory configuration parameter.
If pgpro_axe lacks RAM for executing an analytical query, the associated OLAP data is placed in temporary files, which significantly increases the execution time. To maximize the total Postgres Pro AXE cluster throughput and the number of analytical queries executed per unit of time, process most analytical queries in RAM.
However, temporary files can be used when processing resource-intensive analytical queries with long execution time (hours).
2.4.3. Required Analytical Query Execution Time #
Certain analytical queries must complete within a specific time limit. In particular, this applies to interactive queries and queries with results that must be available by a certain date and time, for example, at the start of business hours. If sufficient RAM is allocated for executing analytical queries, and temporary files are not created for them, the execution time is determined by the number of CPU cores reserved for executing analytical queries.
Use internal benchmark tests to estimate the required number of CPU cores. During these tests, time-sensitive analytical queries are executed to the test data, the size and structure of which mimic the real data to be used.
2.4.4. Number of Analytical Queries Executed per Unit of Time #
After allocating the computational resources required for executing analytical queries, calculate the total workload from analytical queries executed in parallel. This workload tends to peak once during business hours or when performing resource-intensive ETL operations.
Certain ETL operations can be scheduled to be performed at a specific time of day, whereas others — such as ad-hoc analytical queries — cannot.
Avoid OLAP workload peaks by evenly distributing the execution of analytical queries throughout the day and leaving a 30% margin for unexpected operations.
pgpro_axe processes analytical queries linearly as long as it has sufficient I/O resources, RAM, and CPUs. Once these resources are exhausted, the performance drops.
Recommendations for avoiding resource exhaustion:
eliminate bottlenecks in the Postgres Pro AXE cluster
reduce OLAP workload peaks
optimize the execution of analytical queries
redistribute ETL operations and ad-hoc analytical queries to a dedicated Postgres Pro AXE server, for example, to an asynchronous standby replica
2.4.5. Considerations and Limitations #
Postgres Pro AXE has the following limitations that must be considered when executing OLAP operations:
pgpro_axe executes SQL queries within a single server and is limited by the server capacity. Typically, limit of data that the pgpro_axe can process in a single SQL query on an average server (256 CPU, 2 TB RAM) does not exceed 100 TB.
To process a larger data scope, split the initial data into smaller parts. Process each part in a separate analytical query and save the result in a temporary table. After processing all parts, unite the results in a final analytical query.
For the massively parallel mode with many servers involved, it is required either to balance SQL queries at the application level or to make use of Proxima.
The pgpro_axe SQL syntax is compatible with but does not fully match the PostgreSQL syntax. Certain queries operating within Postgres Pro must be modified to be executable in pgpro_axe.