G.4. pgpro_planner — additional query transformations and optimizations #

G.4.1. Description #

The pgpro_planner extension provides query transformations and optimizations that allow producing efficient query execution plans and improving performance.

These optimizations are the following:

  • transformation of x IN (VALUES (...), (...)) constructions into x = ANY([...]) constructions

  • simplification of x + 0, x - 0, x * 1, x / 1, and x * 0 operations to x

  • caching results of correlated subqueries using the Memoize nodes

By default, the extension is disabled to avoid implicit or unexpected query plan changes. You can enable it as described below. Each transformation can also be enabled or disabled separately using the corresponding configuration parameters to fit specific user needs.

G.4.2. Installation #

The pgpro_planner extension is provided with Postgres Pro Enterprise as a separate pre-built package pgpro-planner-ent-17 (for the detailed installation instructions, see Chapter 17).

To enable pgpro_planner, do the following:

  1. Load the extension in one of the following ways:

    • globally for all sessions by adding the library name to the shared_preload_libraries variable in the postgresql.conf file

      shared_preload_libraries = 'pgpro_planner'
      
    • in the current session using the LOAD command without adding the library name to shared_preload_libraries

      LOAD 'pgpro_planner';
      
  2. Reload the database server for the changes to take effect.

    To verify that the pgpro_planner library is installed correctly, you can run the following command:

    SHOW shared_preload_libraries;
    
  3. Enable the extension by setting the pgpro_planner.enable parameter to on.

    SET pgpro_planner.enable = 'on';
    

G.4.3. Transformation of VALUES into ANY #

Typically, the standard planner processes x IN (VALUES (...), (...)) constructions in a suboptimal way. If subqueries contain these constructions, the planner can also rewrite them to the JOIN clauses to avoid repeated scans. This can lead to wrong optimization decisions and inefficient query execution plans.

The pgpro_planner extension can transform basic x IN (VALUES (...), (...)) constructions into x = ANY([...]) constructions. This transformation applies to the parse tree of a query before the planner starts to create a query execution plan.

pgpro_planner processes queries as follows:

  1. Searches for x IN (VALUES (...), (...)) constructions in subqueries.

  2. Tries to extract all values from the VALUES expression.

  3. Builds an array from the extracted values.

  4. Creates the corresponding x = ANY([...]) construction.

  5. In the parse tree, replaces the initial subquery with the corresponding scalar array operation.

This transformation allows the planner to avoid excessive JOIN clauses and cardinality estimation errors. For ANY([...]) constructions, the planner can also produce query plans with any access method and hash matching strategy that are most effective in each particular case.

You can disable this transformation by setting the pgpro_planner.enable_values_transformation parameter to off, for instance, if you do not use the VALUES expressions.

G.4.3.1. VALUES Transformation Example #

This example demonstrates how pgpro_planner transformation works for the following sample query:

EXPLAIN (COSTS FALSE) SELECT * FROM onek
  WHERE unique1 IN (VALUES(1200), (1));

Without this transformation, the query execution plan looks as follows:

Nested Loop
  ->  Unique
        ->  Sort
              Sort Key: "*VALUES*".column1
              ->  Values Scan on "*VALUES*"
  ->  Index Scan using onek_unique1 on onek
        Index Cond: (unique1 = "*VALUES*".column1)
(7 rows)

For the VALUES expression, the planner creates the corresponding temporary table and performs value comparisons using the Nested Loop and Index Cond nodes. These decisions lead to long query execution time.

When pgpro_planner transforms the VALUES expression, the resulting plan uses the more effective Bitmap Heap Scan node instead of Nested Loop, which reduces query execution time.

Bitmap Heap Scan on onek
  Recheck Cond: (unique1 = ANY ('{1200,1}'::integer[]))
  ->  Bitmap Index Scan on onek_unique1
        Index Cond: (unique1 = ANY ('{1200,1}'::integer[]))
(4 rows)

G.4.3.2. VALUES Transformation Limitations #

pgpro_planner cannot transform the VALUES expression in the following cases:

  • The expression contains volatile functions.

  • The expression contains non-constant values, for instance, VALUES (0), (unique1).

  • The expression contains values of complex data types, for instance, VALUES (1,1), (20,0).

  • The expression contains NULL values.

  • The expression contains the LIMIT, OFFSET, or ORDER BY clauses.

  • The expression is used in a subquery nested within another subquery.

  • There is no operator for the resulting x = ANY([...]) variable and array.

G.4.4. Simplification of Trivial Arithmetic Operations #

Even for trivial arithmetic operations, such as x + 0, the standard planner predicts selectivity and cardinality as it does for expressions and cannot use indexes in query execution plans.

The pgpro_planner extension can simplify and rewrite the following basic arithmetic operations:

  • x + 0x

  • x - 0x

  • x * 1x

  • x / 1x

  • x * 0x (only if x is not NULL)

This simplification allows the planner not only to avoid unnecessary calculations, but also to reduce estimation errors, produce plans with indexes and effective access methods, and thus significantly improve performance.

You can disable this simplification by setting the pgpro_planner.enable_simplify_trivials parameter to off, for instance, for debugging purposes.

G.4.4.1. Arithmetic Simplification Example #

This example demonstrates how pgpro_planner arithmetic simplification affects execution plans for the following query:

SELECT * FROM t WHERE x + 0 > 900;

For the initial arithmetic operation, the planner produces the query execution plan with Seq Scan and does not use an index.

Seq Scan on t  (cost=0.00..20.00 rows=333 width=4) (actual time=0.500..0.571 rows=100.00 loops=1)
  Filter: ((x + 0) > 900)
  Rows Removed by Filter: 900
  Buffers: shared hit=5
Planning:
  Buffers: shared hit=14
Planning Time: 0.432 ms
Execution Time: 0.627 ms
(8 rows)

When pgpro_planner simplifies this operation, the planner creates the plan with Index Only Scan and performs value comparisons in Index Cond, which helps to reduce query execution time.

Index Only Scan using idx_t_x on t  (cost=0.28..10.03 rows=100 width=4) (actual time=0.077..0.247 rows=100.00 loops=1)
  Index Cond: (x > 900)
  Heap Fetches: 100
  Index Searches: 1
  Buffers: shared hit=4
Planning Time: 0.217 ms
Execution Time: 0.319 ms
(7 rows)

G.4.4.2. Arithmetic Simplification Limitations #

pgpro_planner has the following limitations for arithmetic simplification:

  • Only the following operations are supported: x + 0, x - 0, x * 1, x / 1, and x * 0 (if x is not NULL).

  • Only the following data types for constants are supported: int2, int4, float, and numeric.

  • Operations that use complex expressions to get 0 or 1, such as x - 100 - 100, cannot be simplified.

  • Operations that use volatile functions cannot be simplified.

G.4.5. Caching of Correlated Subqueries #

Correlated subqueries use values from their outer queries, for instance, via the WHERE clause. These subqueries are executed once for each row of the outer query, so they can have major impact on performance.

The pgpro_planner extension can cache results of parameterized correlated subqueries by adding the Memoize nodes to query execution plans. This allows skipping identical scans for the same parameters and improving performance.

This functionality is enabled if the global enable_memoize configuration parameter and the pgpro_planner.memoize_subplan parameter of pgpro_planner are set to on (default).

pgpro_planner processes a query during the plan creation stage. It performs the following steps:

  1. pgpro_planner searches for supported correlated subqueries.

  2. If the pgpro_planner.memoize_check_uniqueness parameter is set to on (default), pgpro_planner checks uniqueness for values of columns referenced in subqueries based on statistical information. This helps to determine whether the Memoize node can be useful.

    The Memoize node can be added in the following cases:

    • A unique index on a column does not exist, and column values are not unique according to statistical information.

    • A unique index on a column exists but a column contains NULL values according to statistical information.

    • A multicolumn unique index exists but values of the referenced column are not unique according to statistical information.

    • Statistical information for a column is not gathered yet regardless of whether an index exists.

  3. pgpro_planner estimates the expected cost of the resulting plan with the Memoize node.

  4. If the estimated cost is lower than the cost of the initial plan, pgpro_planner adds the Memoize node into the head of the corresponding subplan.

You can disable caching of correlated subqueries by setting the pgpro_planner.memoize_subplan parameter to off, for instance, if you use correlated subqueries rarely. To disable uniqueness check for debugging purposes, set the pgpro_planner.memoize_check_uniqueness parameter to off.

G.4.5.1. Subquery Caching Example #

This example demonstrates how caching of correlated subqueries works in different cases.

Create two sample tables named t and t1 and a unique index on the x column of the first table.

CREATE TABLE t (x int, y int);
INSERT INTO t SELECT id, id%20 FROM generate_series(1,1000) id;
CREATE UNIQUE INDEX t_x_idx ON t (x);
CREATE TABLE t1 (x1 int);
INSERT INTO  t1 SELECT id%2 FROM generate_series(1,1000) id;

Run the following query, where the subquery uses values of the correlated column named y. There is no unique index on this column, so the Memoize node is added to the query execution plan.

EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
                    QUERY PLAN
------------------------------------------------
  Seq Scan on t
    Filter: (SubPlan 1)
    SubPlan 1
      ->  Memoize
            Cache Key: t.y
            Cache Mode: binary
            ->  Limit
                  ->  Aggregate
                        ->  Seq Scan on t1
                              Filter: (x1 = t.y)
(10 rows)

Run another query, where the subquery uses values of the correlated column named x. The unique index on this column exists but statistical information is not gathered yet, so the Memoize node is generated.

EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
                    QUERY PLAN
------------------------------------------------
  Seq Scan on t
    Filter: (SubPlan 1)
    SubPlan 1
      ->  Memoize
            Cache Key: t.x
            Cache Mode: binary
            ->  Limit
                  ->  Aggregate
                        ->  Seq Scan on t1
                              Filter: (x1 = t.x)
(10 rows)

Execute the same query once again. Statistical information for the x column is now gathered, and the unique index still exists, so the Memoize node is not added to the plan.

EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
                            QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on t
   Filter: (x = (SubPlan 1))
   SubPlan 1
     ->  Limit
           ->  Aggregate
                 ->  Seq Scan on t1
                       Filter: (x1 = t.x)
(7 rows)

Remove the previously created index and create a new multicolumn index on both columns named x and y.

CREATE UNIQUE INDEX on t(x,y);
DROP INDEX t_x_idx;

Run the query referencing the correlated y column. Although the multicolumn unique index now exists, values of this column are not unique according to statistical information. The Memoize node is added to the resulting plan.

EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
                            QUERY PLAN
-----------------------------------------------------------------
  Seq Scan on t
    Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
    SubPlan 1
      ->  Memoize
            Cache Key: t.y
            Cache Mode: binary
            ->  Limit
                  ->  Aggregate
                        ->  Seq Scan on t1
                              Filter: (x1 = t.y)
(10 rows)

Run the query referencing the correlated x column. For this case, the Memoize node is not still generated, since values of this column are unique according to statistical information.

EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
                            QUERY PLAN
-----------------------------------------------------------------
  Seq Scan on t
    Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
    SubPlan 1
      ->  Limit
            ->  Aggregate
                  ->  Seq Scan on t1
                        Filter: (x1 = t.x)
(7 rows)

G.4.5.2. Subquery Caching Limitations #

pgpro_planner cannot cache results of a correlated subquery in the following cases:

  • A subquery uses volatile functions.

  • A subquery uses non-standard operators.

  • A subquery uses operands with different types without compatible hash operator classes.

  • A subquery uses the min or max aggregate functions.

  • A query plan tree does not contain the corresponding subplan node.

  • The estimated cost of the target query plan with the Memoize nodes is greater than the cost of the initial plan.

G.4.6. Configuration Parameters #

pgpro_planner.enable (boolean) #

Enables or disables the pgpro_planner extension. The default value is off to avoid implicit or unexpected query plan changes.

pgpro_planner.enable_values_transformation (boolean) #

Enables or disables transformation of x IN (VALUES (...), (...)) constructions into x = ANY([...]) constructions. The default value is on.

pgpro_planner.enable_simplify_trivials (boolean) #

Enables or disables simplification of trivial arithmetic operations. The default value is on.

pgpro_planner.memoize_subplan (boolean) #

Enables or disables caching of correlated subqueries. The default value is on.

pgpro_planner.memoize_check_uniqueness (boolean) #

Enables or disables the smart logic to check uniqueness of column values for caching of correlated subqueries. The default value is on.

pgpro_planner.force_memoize_subplan (boolean) #

Enables or disables automatic addition of the Memoize node for all correlated subqueries ignoring the plan cost model and uniqueness check. This parameter is designed only for debugging purposes, do not use it in production. The default value is off.