G.4. pgpro_planner — additional query transformations and optimizations #
G.4.1. Description #
The pgpro_planner extension provides query transformations and optimizations that allow producing efficient query execution plans and improving performance.
These optimizations are the following:
transformation of
x IN (VALUES (...), (...))constructions intox = ANY([...])constructionssimplification of
x + 0,x - 0,x * 1,x / 1, andx * 0operations toxcaching results of correlated subqueries using the
Memoizenodes
By default, the extension is disabled to avoid implicit or unexpected query plan changes. You can enable it as described below. Each transformation can also be enabled or disabled separately using the corresponding configuration parameters to fit specific user needs.
G.4.2. Installation #
The pgpro_planner extension is provided with Postgres Pro Enterprise as a separate pre-built package pgpro-planner-ent-17 (for the detailed installation instructions, see Chapter 17).
To enable pgpro_planner, do the following:
Load the extension in one of the following ways:
globally for all sessions by adding the library name to the shared_preload_libraries variable in the
postgresql.conffileshared_preload_libraries = 'pgpro_planner'
in the current session using the LOAD command without adding the library name to
shared_preload_librariesLOAD 'pgpro_planner';
Reload the database server for the changes to take effect.
To verify that the pgpro_planner library is installed correctly, you can run the following command:
SHOW shared_preload_libraries;
Enable the extension by setting the pgpro_planner.enable parameter to
on.SET pgpro_planner.enable = 'on';
G.4.3. Transformation of VALUES into ANY #
Typically, the standard planner processes x IN (VALUES (...), (...)) constructions in a suboptimal way. If subqueries contain these constructions, the planner can also rewrite them to the JOIN clauses to avoid repeated scans. This can lead to wrong optimization decisions and inefficient query execution plans.
The pgpro_planner extension can transform basic x IN (VALUES (...), (...)) constructions into x = ANY([...]) constructions. This transformation applies to the parse tree of a query before the planner starts to create a query execution plan.
pgpro_planner processes queries as follows:
Searches for
x IN (VALUES (...), (...))constructions in subqueries.Tries to extract all values from the
VALUESexpression.Builds an array from the extracted values.
Creates the corresponding
x = ANY([...])construction.In the parse tree, replaces the initial subquery with the corresponding scalar array operation.
This transformation allows the planner to avoid excessive JOIN clauses and cardinality estimation errors. For ANY([...]) constructions, the planner can also produce query plans with any access method and hash matching strategy that are most effective in each particular case.
You can disable this transformation by setting the pgpro_planner.enable_values_transformation parameter to off, for instance, if you do not use the VALUES expressions.
G.4.3.1. VALUES Transformation Example #
This example demonstrates how pgpro_planner transformation works for the following sample query:
EXPLAIN (COSTS FALSE) SELECT * FROM onek WHERE unique1 IN (VALUES(1200), (1));
Without this transformation, the query execution plan looks as follows:
Nested Loop
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
(7 rows)
For the VALUES expression, the planner creates the corresponding temporary table and performs value comparisons using the Nested Loop and Index Cond nodes. These decisions lead to long query execution time.
When pgpro_planner transforms the VALUES expression, the resulting plan uses the more effective Bitmap Heap Scan node instead of Nested Loop, which reduces query execution time.
Bitmap Heap Scan on onek
Recheck Cond: (unique1 = ANY ('{1200,1}'::integer[]))
-> Bitmap Index Scan on onek_unique1
Index Cond: (unique1 = ANY ('{1200,1}'::integer[]))
(4 rows)
G.4.3.2. VALUES Transformation Limitations #
pgpro_planner cannot transform the VALUES expression in the following cases:
The expression contains volatile functions.
The expression contains non-constant values, for instance,
VALUES (0), (unique1).The expression contains values of complex data types, for instance,
VALUES (1,1), (20,0).The expression contains
NULLvalues.The expression contains the
LIMIT,OFFSET, orORDER BYclauses.The expression is used in a subquery nested within another subquery.
There is no operator for the resulting
x = ANY([...])variable and array.
G.4.4. Simplification of Trivial Arithmetic Operations #
Even for trivial arithmetic operations, such as x + 0, the standard planner predicts selectivity and cardinality as it does for expressions and cannot use indexes in query execution plans.
The pgpro_planner extension can simplify and rewrite the following basic arithmetic operations:
x + 0→xx - 0→xx * 1→xx / 1→xx * 0→x(only ifxis notNULL)
This simplification allows the planner not only to avoid unnecessary calculations, but also to reduce estimation errors, produce plans with indexes and effective access methods, and thus significantly improve performance.
You can disable this simplification by setting the pgpro_planner.enable_simplify_trivials parameter to off, for instance, for debugging purposes.
G.4.4.1. Arithmetic Simplification Example #
This example demonstrates how pgpro_planner arithmetic simplification affects execution plans for the following query:
SELECT * FROM t WHERE x + 0 > 900;
For the initial arithmetic operation, the planner produces the query execution plan with Seq Scan and does not use an index.
Seq Scan on t (cost=0.00..20.00 rows=333 width=4) (actual time=0.500..0.571 rows=100.00 loops=1) Filter: ((x + 0) > 900) Rows Removed by Filter: 900 Buffers: shared hit=5 Planning: Buffers: shared hit=14 Planning Time: 0.432 ms Execution Time: 0.627 ms (8 rows)
When pgpro_planner simplifies this operation, the planner creates the plan with Index Only Scan and performs value comparisons in Index Cond, which helps to reduce query execution time.
Index Only Scan using idx_t_x on t (cost=0.28..10.03 rows=100 width=4) (actual time=0.077..0.247 rows=100.00 loops=1) Index Cond: (x > 900) Heap Fetches: 100 Index Searches: 1 Buffers: shared hit=4 Planning Time: 0.217 ms Execution Time: 0.319 ms (7 rows)
G.4.4.2. Arithmetic Simplification Limitations #
pgpro_planner has the following limitations for arithmetic simplification:
Only the following operations are supported:
x + 0,x - 0,x * 1,x / 1, andx * 0(ifxis notNULL).Only the following data types for constants are supported:
int2,int4,float, andnumeric.Operations that use complex expressions to get
0or1, such asx - 100 - 100, cannot be simplified.Operations that use volatile functions cannot be simplified.
G.4.5. Caching of Correlated Subqueries #
Correlated subqueries use values from their outer queries, for instance, via the WHERE clause. These subqueries are executed once for each row of the outer query, so they can have major impact on performance.
The pgpro_planner extension can cache results of parameterized correlated subqueries by adding the Memoize nodes to query execution plans. This allows skipping identical scans for the same parameters and improving performance.
This functionality is enabled if the global enable_memoize configuration parameter and the pgpro_planner.memoize_subplan parameter of pgpro_planner are set to on (default).
pgpro_planner processes a query during the plan creation stage. It performs the following steps:
pgpro_planner searches for supported correlated subqueries.
If the pgpro_planner.memoize_check_uniqueness parameter is set to
on(default), pgpro_planner checks uniqueness for values of columns referenced in subqueries based on statistical information. This helps to determine whether theMemoizenode can be useful.The
Memoizenode can be added in the following cases:A unique index on a column does not exist, and column values are not unique according to statistical information.
A unique index on a column exists but a column contains
NULLvalues according to statistical information.A multicolumn unique index exists but values of the referenced column are not unique according to statistical information.
Statistical information for a column is not gathered yet regardless of whether an index exists.
pgpro_planner estimates the expected cost of the resulting plan with the
Memoizenode.If the estimated cost is lower than the cost of the initial plan, pgpro_planner adds the
Memoizenode into the head of the corresponding subplan.
You can disable caching of correlated subqueries by setting the pgpro_planner.memoize_subplan parameter to off, for instance, if you use correlated subqueries rarely. To disable uniqueness check for debugging purposes, set the pgpro_planner.memoize_check_uniqueness parameter to off.
G.4.5.1. Subquery Caching Example #
This example demonstrates how caching of correlated subqueries works in different cases.
Create two sample tables named t and t1 and a unique index on the x column of the first table.
CREATE TABLE t (x int, y int); INSERT INTO t SELECT id, id%20 FROM generate_series(1,1000) id; CREATE UNIQUE INDEX t_x_idx ON t (x); CREATE TABLE t1 (x1 int); INSERT INTO t1 SELECT id%2 FROM generate_series(1,1000) id;
Run the following query, where the subquery uses values of the correlated column named y. There is no unique index on this column, so the Memoize node is added to the query execution plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
QUERY PLAN
------------------------------------------------
Seq Scan on t
Filter: (SubPlan 1)
SubPlan 1
-> Memoize
Cache Key: t.y
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.y)
(10 rows)
Run another query, where the subquery uses values of the correlated column named x. The unique index on this column exists but statistical information is not gathered yet, so the Memoize node is generated.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
------------------------------------------------
Seq Scan on t
Filter: (SubPlan 1)
SubPlan 1
-> Memoize
Cache Key: t.x
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(10 rows)
Execute the same query once again. Statistical information for the x column is now gathered, and the unique index still exists, so the Memoize node is not added to the plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (x = (SubPlan 1))
SubPlan 1
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(7 rows)
Remove the previously created index and create a new multicolumn index on both columns named x and y.
CREATE UNIQUE INDEX on t(x,y); DROP INDEX t_x_idx;
Run the query referencing the correlated y column. Although the multicolumn unique index now exists, values of this column are not unique according to statistical information. The Memoize node is added to the resulting plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
SubPlan 1
-> Memoize
Cache Key: t.y
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.y)
(10 rows)
Run the query referencing the correlated x column. For this case, the Memoize node is not still generated, since values of this column are unique according to statistical information.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
SubPlan 1
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(7 rows)
G.4.5.2. Subquery Caching Limitations #
pgpro_planner cannot cache results of a correlated subquery in the following cases:
A subquery uses volatile functions.
A subquery uses non-standard operators.
A subquery uses operands with different types without compatible hash operator classes.
A subquery uses the
minormaxaggregate functions.A query plan tree does not contain the corresponding subplan node.
The estimated cost of the target query plan with the
Memoizenodes is greater than the cost of the initial plan.
G.4.6. Configuration Parameters #
-
pgpro_planner.enable(boolean) # Enables or disables the pgpro_planner extension. The default value is
offto avoid implicit or unexpected query plan changes.-
pgpro_planner.enable_values_transformation(boolean) # Enables or disables transformation of
x IN (VALUES (...), (...))constructions intox = ANY([...])constructions. The default value ison.-
pgpro_planner.enable_simplify_trivials(boolean) # Enables or disables simplification of trivial arithmetic operations. The default value is
on.-
pgpro_planner.memoize_subplan(boolean) # Enables or disables caching of correlated subqueries. The default value is
on.-
pgpro_planner.memoize_check_uniqueness(boolean) # Enables or disables the smart logic to check uniqueness of column values for caching of correlated subqueries. The default value is
on.-
pgpro_planner.force_memoize_subplan(boolean) # Enables or disables automatic addition of the
Memoizenode for all correlated subqueries ignoring the plan cost model and uniqueness check. This parameter is designed only for debugging purposes, do not use it in production. The default value isoff.