Thread: Unexpected Performance for the Function simplify_function

Unexpected Performance for the Function simplify_function

From
Ba Jinsheng
Date:
Dear PostgreSQL Community,

For the query 10 in TPC-H benchmark:
select
      c_custkey,
      c_name,
      sum(l_extendedprice * (1 - l_discount)) as revenue,
      c_acctbal,
      n_name,
      c_address,
      c_phone,
      c_comment
from
      CUSTOMER,
      ORDERS,
      LINEITEM,
      NATION
where
      c_custkey = o_custkey
      and l_orderkey = o_orderkey
      and o_orderdate >= date '1993-08-01'
      and o_orderdate < date '1993-08-01' + interval '3' month
      and l_returnflag = 'R'
      and c_nationkey = n_nationkey
group by
      c_custkey,
      c_name,
      c_acctbal,
      c_phone,
      n_name,
      c_address,
      c_comment
order by
      revenue desc
limit
      20;


Its query plan is:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=200479.71..200479.76 rows=20 width=205) (actual time=506.558..510.594 rows=20 loops=1)
   ->  Sort  (cost=200479.71..200622.37 rows=57064 width=205) (actual time=506.557..510.591 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=191629.63..198961.25 rows=57064 width=205) (actual time=441.132..501.986 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=191629.63..197772.41 rows=47554 width=205) (actual time=441.124..474.623 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=190629.61..191283.48 rows=23777 width=205) (actual time=437.497..464.923 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=190629.61..190689.05 rows=23777 width=185) (actual time=437.485..441.339 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7184kB
                                 Worker 0:  Sort Method: external merge  Disk: 7448kB
                                 Worker 1:  Sort Method: external merge  Disk: 7264kB
                                 ->  Hash Join  (cost=181606.66..186706.85 rows=23777 width=185) (actual time=385.555..418.269 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=181605.09..186632.29 rows=23777 width=160) (actual time=385.484..411.936 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4225.00 rows=62500 width=148) (actual time=0.028..9.805 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=181307.88..181307.88 rows=23777 width=16) (actual time=385.060..385.063 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 65536)  Batches: 1 (originally 1)  Memory Usage: 7648kB
                                                   ->  Parallel Hash Join  (cost=35809.22..181307.88 rows=23777 width=16) (actual time=69.608..371.381 rows=38183 loops=3)
                                                         Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                                                         ->  Parallel Seq Scan on lineitem  (cost=0.00..143863.66 rows=622855 width=16) (actual time=0.024..255.818 rows=492957 loops=3)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 1507448
                                                         ->  Parallel Hash  (cost=35511.00..35511.00 rows=23858 width=8) (actual time=68.857..68.858 rows=19046 loops=3)
                                                               Buckets: 65536  Batches: 1  Memory Usage: 2816kB
                                                               ->  Parallel Seq Scan on orders  (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907 rows=19046 loops=3)
                                                                     Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp without time zone))
                                                                     Rows Removed by Filter: 480954
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=33) (actual time=0.037..0.037 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=33) (actual time=0.020..0.024 rows=25 loops=3)
 Planning Time: 2.295 ms
 Execution Time: 512.015 ms
(38 rows)


While if I apply this patch to disable the simplify_function():

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4..155bbd9fbc 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2636,15 +2636,6 @@ eval_const_expressions_mutator(Node *node,
                                 * Code for op/func reduction is pretty bulky, so split it out
                                 * as a separate function.
                                 */
-                               simple = simplify_function(expr->opfuncid,
-                                                                                  expr->opresulttype, -1,
-                                                                                  expr->opcollid,
-                                                                                  expr->inputcollid,
-                                                                                  &args,
-                                                                                  false,
-                                                                                  true,
-                                                                                  true,
-                                                                                  context);
                                if (simple)             /* successfully simplified it */
                                        return (Node *) simple;
 

Then we can get a more efficient query plan:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=120917.71..120917.76 rows=20 width=202) (actual time=336.255..344.250 rows=20 loops=1)
   ->  Sort  (cost=120917.71..120936.18 rows=7387 width=202) (actual time=336.254..344.248 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ((1)::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=119764.35..120721.15 rows=7387 width=202) (actual time=271.425..335.755 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=119764.35..120567.25 rows=6156 width=202) (actual time=271.420..309.049 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=118764.33..118856.67 rows=3078 width=202) (actual time=267.897..296.298 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=118764.33..118772.02 rows=3078 width=182) (actual time=267.881..271.626 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7248kB
                                 Worker 0:  Sort Method: external merge  Disk: 7328kB
                                 Worker 1:  Sort Method: external merge  Disk: 7328kB
                                 ->  Hash Join  (cost=113641.60..118585.99 rows=3078 width=182) (actual time=222.247..249.682 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=113640.04..118574.98 rows=3078 width=160) (actual time=222.183..243.657 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4219.00 rows=62500 width=148) (actual time=0.029..5.646 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=113601.56..113601.56 rows=3078 width=16) (actual time=221.947..221.948 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 8064kB
                                                   ->  Nested Loop  (cost=0.43..113601.56 rows=3078 width=16) (actual time=0.096..204.897 rows=38183 loops=3)
                                                         ->  Parallel Seq Scan on orders  (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908 rows=19046 loops=3)
                                                               Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3 mons'::interval month)))
                                                               Rows Removed by Filter: 480954
                                                         ->  Index Scan using lineitem_pkey on lineitem  (cost=0.43..24.45 rows=4 width=16) (actual time=0.006..0.007 rows=2 loops=57138)
                                                               Index Cond: (l_orderkey = orders.o_orderkey)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 2
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=30) (actual time=0.034..0.035 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=30) (actual time=0.018..0.023 rows=25 loops=3)
 Planning Time: 1.207 ms
 Execution Time: 345.427 ms
(36 rows)


The estimated cost is reduced by 39.69%, and the execution time is reduced by 32.54%. I measured the execution time on average of 10 executions.
I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the simplify_function() function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize simplify_function() to make the performance better for this workload?



Environment:
I used 1 GB data of TPC-H benchmark, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing
The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpch"

tpch=# select version();
                                             version                                              
--------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)


Best regards,

Jinsheng Ba

 

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

Re: Unexpected Performance for the Function simplify_function

From
Shiv Iyer
Date:

Hello, 


The query plans and results you shared illustrate the unexpected performance differences between using and bypassing the simplify_function() logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts on optimizing this scenario:


Overview of the Problem


The purpose of simplify_function() in PostgreSQL’s optimizer is to replace function calls with constant values when possible, thus simplifying expressions. However, as demonstrated in your results, this simplification appears to have an adverse effect on the performance of a specific workload. Disabling simplify_function() in this case leads to a notable reduction in both the estimated query cost and the execution time.


Performance Gains Observed


Estimated Cost Reduction: 39.69%

Execution Time Reduction: 32.54%


Key Insights from the Execution Plans


1. Hash Joins and Parallel Execution: In both execution plans, PostgreSQL is utilizing hash joins and parallel execution to distribute the workload across multiple workers. The gather merge phase in both plans confirms that multiple parallel workers are involved.

2. Sorting and Finalization: Both query plans include a final group aggregation step followed by sorting on the computed revenue (sum(l_extendedprice * (1 - l_discount))). The main difference observed is in how the simplified or unsimplified revenue calculation impacts the query cost.

3. Disk I/O Reduction: The plan without simplify_function() seems to leverage more efficient disk usage, especially with external merge sorts. The reduction in disk usage indicates that PostgreSQL can better leverage in-memory operations in the absence of function simplification.


Why Might simplify_function() be Causing Problems?


The exact issue seems to lie in how simplify_function() transforms expressions. There could be multiple factors at play, including:


1. Expression Over-Simplification: It’s possible that simplify_function() results in a less optimal execution plan due to over-simplification, which reduces PostgreSQL’s ability to leverage index scans, joins, or parallelism effectively.

2. Changes in Cost Estimation: When the expression is simplified, it may lead to changes in cost estimation, causing the planner to choose a different join order or sorting strategy that is less optimal.

3. Overhead of Repeated Computation: The simplification might be causing PostgreSQL to repeatedly compute certain expressions that could have been computed once within a more complex expression.


Potential Strategies to Address This Problem


To address this problem, we have a few avenues to explore within the PostgreSQL planner and optimizer:


1. Investigate and Optimize Simplify Function Logic:

Understand the specific cases where simplify_function() introduces overhead or leads to suboptimal execution plans. This could involve reviewing whether the simplification is inadvertently forcing PostgreSQL to choose less efficient join strategies, index scans, or aggregations.

Analyze the functions targeted by simplify_function() and consider how their simplification impacts the plan selection logic.

2. Fine-Tuning Cost Estimation:

PostgreSQL’s cost estimation formulas could be missing certain cost elements introduced by the simplified expressions. Reviewing and refining the cost model for simplified expressions could lead to more accurate plan choices.

3. Expression Memoization or Pre-Aggregation:

One approach could be to examine whether simplifying repeated expressions could lead to recomputation inefficiencies. Instead of simplifying certain functions, PostgreSQL could cache or pre-aggregate expressions where appropriate, avoiding unnecessary recomputation.


Recommendations to Move Forward


Since the results you presented indicate a substantial improvement when bypassing the simplify_function(), it suggests that there’s room for optimization within this specific area of the PostgreSQL optimizer.


1. Perform More Tests with Varying Workloads: Test whether disabling the simplify_function() yields consistent benefits across different workloads or if it’s specific to this TPC-H Query 10 pattern.

2. Community Discussion: The results suggest that this may warrant a discussion with the PostgreSQL development community, especially if it’s not an isolated case. Optimizations in core PostgreSQL logic should go through community review and testing.

3. Patch Submission: If you have identified specific changes to the simplify_function() that improve efficiency, prepare a patch for submission to the PostgreSQL community, ensuring that the patch is thoroughly reviewed and benchmarked.


Conclusion


The primary takeaway is that simplify_function() seems to be triggering less efficient query plans in this case, likely due to over-simplification of expressions. The solution should involve a detailed examination of the expression transformations and their interaction with PostgreSQL’s planner. This could lead to refining either the expression simplification logic or the cost estimation for such transformed expressions.


If this behavior is generalizable beyond just TPC-H Query 10, it could be a valuable enhancement to PostgreSQL’s optimizer. Otherwise, a targeted fix for specific scenarios might be sufficient.


On Fri, Oct 25, 2024 at 1:14 AM Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
Dear PostgreSQL Community,

For the query 10 in TPC-H benchmark:
select
      c_custkey,
      c_name,
      sum(l_extendedprice * (1 - l_discount)) as revenue,
      c_acctbal,
      n_name,
      c_address,
      c_phone,
      c_comment
from
      CUSTOMER,
      ORDERS,
      LINEITEM,
      NATION
where
      c_custkey = o_custkey
      and l_orderkey = o_orderkey
      and o_orderdate >= date '1993-08-01'
      and o_orderdate < date '1993-08-01' + interval '3' month
      and l_returnflag = 'R'
      and c_nationkey = n_nationkey
group by
      c_custkey,
      c_name,
      c_acctbal,
      c_phone,
      n_name,
      c_address,
      c_comment
order by
      revenue desc
limit
      20;


Its query plan is:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=200479.71..200479.76 rows=20 width=205) (actual time=506.558..510.594 rows=20 loops=1)
   ->  Sort  (cost=200479.71..200622.37 rows=57064 width=205) (actual time=506.557..510.591 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=191629.63..198961.25 rows=57064 width=205) (actual time=441.132..501.986 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=191629.63..197772.41 rows=47554 width=205) (actual time=441.124..474.623 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=190629.61..191283.48 rows=23777 width=205) (actual time=437.497..464.923 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=190629.61..190689.05 rows=23777 width=185) (actual time=437.485..441.339 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7184kB
                                 Worker 0:  Sort Method: external merge  Disk: 7448kB
                                 Worker 1:  Sort Method: external merge  Disk: 7264kB
                                 ->  Hash Join  (cost=181606.66..186706.85 rows=23777 width=185) (actual time=385.555..418.269 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=181605.09..186632.29 rows=23777 width=160) (actual time=385.484..411.936 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4225.00 rows=62500 width=148) (actual time=0.028..9.805 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=181307.88..181307.88 rows=23777 width=16) (actual time=385.060..385.063 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 65536)  Batches: 1 (originally 1)  Memory Usage: 7648kB
                                                   ->  Parallel Hash Join  (cost=35809.22..181307.88 rows=23777 width=16) (actual time=69.608..371.381 rows=38183 loops=3)
                                                         Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                                                         ->  Parallel Seq Scan on lineitem  (cost=0.00..143863.66 rows=622855 width=16) (actual time=0.024..255.818 rows=492957 loops=3)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 1507448
                                                         ->  Parallel Hash  (cost=35511.00..35511.00 rows=23858 width=8) (actual time=68.857..68.858 rows=19046 loops=3)
                                                               Buckets: 65536  Batches: 1  Memory Usage: 2816kB
                                                               ->  Parallel Seq Scan on orders  (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907 rows=19046 loops=3)
                                                                     Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp without time zone))
                                                                     Rows Removed by Filter: 480954
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=33) (actual time=0.037..0.037 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=33) (actual time=0.020..0.024 rows=25 loops=3)
 Planning Time: 2.295 ms
 Execution Time: 512.015 ms
(38 rows)


While if I apply this patch to disable the simplify_function():

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4..155bbd9fbc 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2636,15 +2636,6 @@ eval_const_expressions_mutator(Node *node,
                                 * Code for op/func reduction is pretty bulky, so split it out
                                 * as a separate function.
                                 */
-                               simple = simplify_function(expr->opfuncid,
-                                                                                  expr->opresulttype, -1,
-                                                                                  expr->opcollid,
-                                                                                  expr->inputcollid,
-                                                                                  &args,
-                                                                                  false,
-                                                                                  true,
-                                                                                  true,
-                                                                                  context);
                                if (simple)             /* successfully simplified it */
                                        return (Node *) simple;
 

Then we can get a more efficient query plan:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=120917.71..120917.76 rows=20 width=202) (actual time=336.255..344.250 rows=20 loops=1)
   ->  Sort  (cost=120917.71..120936.18 rows=7387 width=202) (actual time=336.254..344.248 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ((1)::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=119764.35..120721.15 rows=7387 width=202) (actual time=271.425..335.755 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=119764.35..120567.25 rows=6156 width=202) (actual time=271.420..309.049 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=118764.33..118856.67 rows=3078 width=202) (actual time=267.897..296.298 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=118764.33..118772.02 rows=3078 width=182) (actual time=267.881..271.626 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7248kB
                                 Worker 0:  Sort Method: external merge  Disk: 7328kB
                                 Worker 1:  Sort Method: external merge  Disk: 7328kB
                                 ->  Hash Join  (cost=113641.60..118585.99 rows=3078 width=182) (actual time=222.247..249.682 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=113640.04..118574.98 rows=3078 width=160) (actual time=222.183..243.657 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4219.00 rows=62500 width=148) (actual time=0.029..5.646 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=113601.56..113601.56 rows=3078 width=16) (actual time=221.947..221.948 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 8064kB
                                                   ->  Nested Loop  (cost=0.43..113601.56 rows=3078 width=16) (actual time=0.096..204.897 rows=38183 loops=3)
                                                         ->  Parallel Seq Scan on orders  (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908 rows=19046 loops=3)
                                                               Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3 mons'::interval month)))
                                                               Rows Removed by Filter: 480954
                                                         ->  Index Scan using lineitem_pkey on lineitem  (cost=0.43..24.45 rows=4 width=16) (actual time=0.006..0.007 rows=2 loops=57138)
                                                               Index Cond: (l_orderkey = orders.o_orderkey)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 2
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=30) (actual time=0.034..0.035 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=30) (actual time=0.018..0.023 rows=25 loops=3)
 Planning Time: 1.207 ms
 Execution Time: 345.427 ms
(36 rows)


The estimated cost is reduced by 39.69%, and the execution time is reduced by 32.54%. I measured the execution time on average of 10 executions.
I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the simplify_function() function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize simplify_function() to make the performance better for this workload?



Environment:
I used 1 GB data of TPC-H benchmark, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing
The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpch"

tpch=# select version();
                                             version                                              
--------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)


Best regards,

Jinsheng Ba

 

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.


--

Best Regards 
Shiv Iyer 

Re: Unexpected Performance for the Function simplify_function

From
Tom Lane
Date:
Ba Jinsheng <bajinsheng@u.nus.edu> writes:
> I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the
simplify_function()function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize
simplify_function()to make the performance better for this workload? 

The improvement here seems purely accidental.  Basically, you have
broken the rowcount estimation for the "orders" scan: instead of

>                ->  Parallel Seq Scan on orders  (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907
rows=19046loops=3) 
>                      Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp
withouttime zone)) 
>                      Rows Removed by Filter: 480954

we get

>                ->  Parallel Seq Scan on orders  (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908
rows=19046loops=3) 
>                   Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3
mons'::intervalmonth))) 
>                  Rows Removed by Filter: 480954

The rows estimate is a good deal further from reality, as a
consequence of the fact that scalarltsel is now delivering a default
guess instead of a real estimate based on a constant comparison value.
Somehow, that's nonetheless leading to a better plan choice.  It could
be that the underlying problem is a poor cost model for parallel hash
join.  Or it could be as simple as random_page_cost not being tuned
for your environment.  It looks like the better plan involves a
nestloop with inner indexscan on lineitem, which is something whose
estimated cost depends enormously on random_page_cost.  You've given
us exactly zero detail about your test conditions, so it's hard to say
more than that.

            regards, tom lane



Re: Unexpected Performance for the Function simplify_function

From
Tom Lane
Date:
Shiv Iyer <shiv@minervadb.com> writes:
> The query plans and results you shared illustrate the unexpected
> performance differences between using and bypassing the simplify_function()
> logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts
> on optimizing this scenario:

Just out of curiosity, was this AI-generated?

            regards, tom lane



Re: Unexpected Performance for the Function simplify_function

From
"David G. Johnston"
Date:
On Thursday, October 24, 2024, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shiv Iyer <shiv@minervadb.com> writes:
> The query plans and results you shared illustrate the unexpected
> performance differences between using and bypassing the simplify_function()
> logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts
> on optimizing this scenario:

Just out of curiosity, was this AI-generated?


Nowadays, failing the Turing Test means producing too comprehensive of an answer to a query/situation as opposed to looking too incapable.

The real question is what prompt did the author give to which tool, and maybe, if we are feeling generous, how much editing of the response was performed.

David J.

Re: Unexpected Performance for the Function simplify_function

From
Greg Sabino Mullane
Date:
On Thu, Oct 24, 2024 at 3:49 PM Shiv Iyer <shiv@minervadb.com> wrote:

Hello, 


The query plans and results you shared illustrate the unexpected performance differences between using and bypassing the simplify_function() logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts on optimizing this scenario:


Please don't post LLM-generated answers to this list. Or any Postgres list.


Re: Unexpected Performance for the Function simplify_function

From
Andrei Lepikhov
Date:
On 10/25/24 02:43, Ba Jinsheng wrote:
> I am not proposing a fixing patch, as the patch is incorrect. Instead, I 
> just want to show disabling the simplify_function() function brings 
> performance benefit, and it seems unexpected. I am wondering whether we 
> can optimize simplify_function() to make the performance better for this 
> workload?
I also discovered your case. Using AQO and settling the correct 
cardinalities in each node, I found that the plan doesn't change at all.
So, I wonder if you could analyse the path-choosing logic, determine the 
costs of competing paths, and explain why NestLoop wasn't chosen.
Maybe there is kind of early selectivity estimation error or something 
even more deep: specific tuples distribution across blocks of the heap 
table.

-- 
regards, Andrei Lepikhov




Re: Unexpected Performance for the Function simplify_function

From
Ba Jinsheng
Date:
>So, I wonder if you could analyse the path-choosing logic, determine the
costs of competing paths, and explain why NestLoop wasn't chosen.

To be honest, it is a bit challenging for me. 
I guess the better query plan is not considered when comparing the cost of paths?



Best regards,

Jinsheng Ba

 



From: Andrei Lepikhov <lepihov@gmail.com>
Sent: Friday, October 25, 2024 4:13 AM
To: Ba Jinsheng <bajinsheng@u.nus.edu>
Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Unexpected Performance for the Function simplify_function
 
        - External Email -



On 10/25/24 02:43, Ba Jinsheng wrote:
> I am not proposing a fixing patch, as the patch is incorrect. Instead, I
> just want to show disabling the simplify_function() function brings
> performance benefit, and it seems unexpected. I am wondering whether we
> can optimize simplify_function() to make the performance better for this
> workload?
I also discovered your case. Using AQO and settling the correct
cardinalities in each node, I found that the plan doesn't change at all.
So, I wonder if you could analyse the path-choosing logic, determine the
costs of competing paths, and explain why NestLoop wasn't chosen.
Maybe there is kind of early selectivity estimation error or something
even more deep: specific tuples distribution across blocks of the heap
table.

--
regards, Andrei Lepikhov

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

Re: Unexpected Performance for the Function simplify_function

From
Ba Jinsheng
Date:
>It looks like the better plan involves a
>nestloop with inner indexscan on lineitem, which is something whose
>estimated cost depends enormously on random_page_cost.  You've given
>us exactly zero detail about your test conditions, so it's hard to say
>more than that.

I used the default configuration in the file src/backend/utils/misc/postgresql.conf.sample
So the random_page_cost = 4.0 

Best regards,

Jinsheng Ba


Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

Re: Unexpected Performance for the Function simplify_function

From
Andrei Lepikhov
Date:
On 10/25/24 16:26, Ba Jinsheng wrote:
>  >So, I wonder if you could analyse the path-choosing logic, determine the
> costs of competing paths, and explain why NestLoop wasn't chosen.
> 
> To be honest, it is a bit challenging for me.
> I guess the better query plan is not considered when comparing the cost 
> of paths?
I personally print all the paths (adding an elog message inside the 
code) and their costs for specific JOIN. Some paths are filtered on 
earlier stages and not even have a leg in pathlist.
Basically, Postgres already has some extended logging for the optimiser 
paths - use -DOPTIMIZER_DEBUG compilation flag.

-- 
regards, Andrei Lepikhov




Re: Unexpected Performance for the Function simplify_function

From
David Rowley
Date:
On Fri, 25 Oct 2024 at 22:26, Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
> I guess the better query plan is not considered when comparing the cost of paths?

You might want to change effective_cache_size is set high enough.
Something like 50-75% of RAM is likely fine.

David



Re: Unexpected Performance for the Function simplify_function

From
Tom Lane
Date:
Ba Jinsheng <bajinsheng@u.nus.edu> writes:
>> It looks like the better plan involves a
>> nestloop with inner indexscan on lineitem, which is something whose
>> estimated cost depends enormously on random_page_cost.  You've given
>> us exactly zero detail about your test conditions, so it's hard to say
>> more than that.

> I used the default configuration in the file src/backend/utils/misc/postgresql.conf.sample
> So the random_page_cost = 4.0

You're still admitting to nothing as to the hardware you are running
this test on.

However, 4.0 is a number we chose decades ago based on typical
performance of spinning-rust storage.  It's not very appropriate
for SSD or similar storage -- numbers just a bit above 1 are
probably the most appropriate thing for that kind of storage.
(There are ongoing discussions about changing the setting's default
value, but so far not backed by any great deal of hard evidence.)

            regards, tom lane



Re: Unexpected Performance for the Function simplify_function

From
Ba Jinsheng
Date:
>You're still admitting to nothing as to the hardware you are running
this test on.

I still do not get what you are saying. You want to know the disk is SSD or not?

It is HDD disk, AMD 3990X 64 Core CPU, 256GB RAM running Ubuntu 22.04


Best regards,

Jinsheng Ba

 



From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, October 26, 2024 8:28 AM
To: Ba Jinsheng <bajinsheng@u.nus.edu>
Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
Subject: Re: Unexpected Performance for the Function simplify_function
 
        - External Email -



Ba Jinsheng <bajinsheng@u.nus.edu> writes:
>> It looks like the better plan involves a
>> nestloop with inner indexscan on lineitem, which is something whose
>> estimated cost depends enormously on random_page_cost.  You've given
>> us exactly zero detail about your test conditions, so it's hard to say
>> more than that.

> I used the default configuration in the file src/backend/utils/misc/postgresql.conf.sample
> So the random_page_cost = 4.0

You're still admitting to nothing as to the hardware you are running
this test on.

However, 4.0 is a number we chose decades ago based on typical
performance of spinning-rust storage.  It's not very appropriate
for SSD or similar storage -- numbers just a bit above 1 are
probably the most appropriate thing for that kind of storage.
(There are ongoing discussions about changing the setting's default
value, but so far not backed by any great deal of hard evidence.)

                        regards, tom lane
Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.