Re: Unexpected Performance for the Function simplify_function - Mailing list pgsql-performance

From Shiv Iyer
Subject Re: Unexpected Performance for the Function simplify_function
Date
Msg-id CAALLqt99Bw+tBER8naQKf=__0yFm+b5MeXV2hjr45_g8oO+aig@mail.gmail.com
Whole thread Raw
In response to Unexpected Performance for the Function simplify_function  (Ba Jinsheng <bajinsheng@u.nus.edu>)
Responses Re: Unexpected Performance for the Function simplify_function
Re: Unexpected Performance for the Function simplify_function
List pgsql-performance

Hello, 


The query plans and results you shared illustrate the unexpected performance differences between using and bypassing the simplify_function() logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts on optimizing this scenario:


Overview of the Problem


The purpose of simplify_function() in PostgreSQL’s optimizer is to replace function calls with constant values when possible, thus simplifying expressions. However, as demonstrated in your results, this simplification appears to have an adverse effect on the performance of a specific workload. Disabling simplify_function() in this case leads to a notable reduction in both the estimated query cost and the execution time.


Performance Gains Observed


Estimated Cost Reduction: 39.69%

Execution Time Reduction: 32.54%


Key Insights from the Execution Plans


1. Hash Joins and Parallel Execution: In both execution plans, PostgreSQL is utilizing hash joins and parallel execution to distribute the workload across multiple workers. The gather merge phase in both plans confirms that multiple parallel workers are involved.

2. Sorting and Finalization: Both query plans include a final group aggregation step followed by sorting on the computed revenue (sum(l_extendedprice * (1 - l_discount))). The main difference observed is in how the simplified or unsimplified revenue calculation impacts the query cost.

3. Disk I/O Reduction: The plan without simplify_function() seems to leverage more efficient disk usage, especially with external merge sorts. The reduction in disk usage indicates that PostgreSQL can better leverage in-memory operations in the absence of function simplification.


Why Might simplify_function() be Causing Problems?


The exact issue seems to lie in how simplify_function() transforms expressions. There could be multiple factors at play, including:


1. Expression Over-Simplification: It’s possible that simplify_function() results in a less optimal execution plan due to over-simplification, which reduces PostgreSQL’s ability to leverage index scans, joins, or parallelism effectively.

2. Changes in Cost Estimation: When the expression is simplified, it may lead to changes in cost estimation, causing the planner to choose a different join order or sorting strategy that is less optimal.

3. Overhead of Repeated Computation: The simplification might be causing PostgreSQL to repeatedly compute certain expressions that could have been computed once within a more complex expression.


Potential Strategies to Address This Problem


To address this problem, we have a few avenues to explore within the PostgreSQL planner and optimizer:


1. Investigate and Optimize Simplify Function Logic:

Understand the specific cases where simplify_function() introduces overhead or leads to suboptimal execution plans. This could involve reviewing whether the simplification is inadvertently forcing PostgreSQL to choose less efficient join strategies, index scans, or aggregations.

Analyze the functions targeted by simplify_function() and consider how their simplification impacts the plan selection logic.

2. Fine-Tuning Cost Estimation:

PostgreSQL’s cost estimation formulas could be missing certain cost elements introduced by the simplified expressions. Reviewing and refining the cost model for simplified expressions could lead to more accurate plan choices.

3. Expression Memoization or Pre-Aggregation:

One approach could be to examine whether simplifying repeated expressions could lead to recomputation inefficiencies. Instead of simplifying certain functions, PostgreSQL could cache or pre-aggregate expressions where appropriate, avoiding unnecessary recomputation.


Recommendations to Move Forward


Since the results you presented indicate a substantial improvement when bypassing the simplify_function(), it suggests that there’s room for optimization within this specific area of the PostgreSQL optimizer.


1. Perform More Tests with Varying Workloads: Test whether disabling the simplify_function() yields consistent benefits across different workloads or if it’s specific to this TPC-H Query 10 pattern.

2. Community Discussion: The results suggest that this may warrant a discussion with the PostgreSQL development community, especially if it’s not an isolated case. Optimizations in core PostgreSQL logic should go through community review and testing.

3. Patch Submission: If you have identified specific changes to the simplify_function() that improve efficiency, prepare a patch for submission to the PostgreSQL community, ensuring that the patch is thoroughly reviewed and benchmarked.


Conclusion


The primary takeaway is that simplify_function() seems to be triggering less efficient query plans in this case, likely due to over-simplification of expressions. The solution should involve a detailed examination of the expression transformations and their interaction with PostgreSQL’s planner. This could lead to refining either the expression simplification logic or the cost estimation for such transformed expressions.


If this behavior is generalizable beyond just TPC-H Query 10, it could be a valuable enhancement to PostgreSQL’s optimizer. Otherwise, a targeted fix for specific scenarios might be sufficient.


On Fri, Oct 25, 2024 at 1:14 AM Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
Dear PostgreSQL Community,

For the query 10 in TPC-H benchmark:
select
      c_custkey,
      c_name,
      sum(l_extendedprice * (1 - l_discount)) as revenue,
      c_acctbal,
      n_name,
      c_address,
      c_phone,
      c_comment
from
      CUSTOMER,
      ORDERS,
      LINEITEM,
      NATION
where
      c_custkey = o_custkey
      and l_orderkey = o_orderkey
      and o_orderdate >= date '1993-08-01'
      and o_orderdate < date '1993-08-01' + interval '3' month
      and l_returnflag = 'R'
      and c_nationkey = n_nationkey
group by
      c_custkey,
      c_name,
      c_acctbal,
      c_phone,
      n_name,
      c_address,
      c_comment
order by
      revenue desc
limit
      20;


Its query plan is:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=200479.71..200479.76 rows=20 width=205) (actual time=506.558..510.594 rows=20 loops=1)
   ->  Sort  (cost=200479.71..200622.37 rows=57064 width=205) (actual time=506.557..510.591 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=191629.63..198961.25 rows=57064 width=205) (actual time=441.132..501.986 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=191629.63..197772.41 rows=47554 width=205) (actual time=441.124..474.623 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=190629.61..191283.48 rows=23777 width=205) (actual time=437.497..464.923 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=190629.61..190689.05 rows=23777 width=185) (actual time=437.485..441.339 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7184kB
                                 Worker 0:  Sort Method: external merge  Disk: 7448kB
                                 Worker 1:  Sort Method: external merge  Disk: 7264kB
                                 ->  Hash Join  (cost=181606.66..186706.85 rows=23777 width=185) (actual time=385.555..418.269 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=181605.09..186632.29 rows=23777 width=160) (actual time=385.484..411.936 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4225.00 rows=62500 width=148) (actual time=0.028..9.805 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=181307.88..181307.88 rows=23777 width=16) (actual time=385.060..385.063 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 65536)  Batches: 1 (originally 1)  Memory Usage: 7648kB
                                                   ->  Parallel Hash Join  (cost=35809.22..181307.88 rows=23777 width=16) (actual time=69.608..371.381 rows=38183 loops=3)
                                                         Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                                                         ->  Parallel Seq Scan on lineitem  (cost=0.00..143863.66 rows=622855 width=16) (actual time=0.024..255.818 rows=492957 loops=3)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 1507448
                                                         ->  Parallel Hash  (cost=35511.00..35511.00 rows=23858 width=8) (actual time=68.857..68.858 rows=19046 loops=3)
                                                               Buckets: 65536  Batches: 1  Memory Usage: 2816kB
                                                               ->  Parallel Seq Scan on orders  (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907 rows=19046 loops=3)
                                                                     Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp without time zone))
                                                                     Rows Removed by Filter: 480954
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=33) (actual time=0.037..0.037 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=33) (actual time=0.020..0.024 rows=25 loops=3)
 Planning Time: 2.295 ms
 Execution Time: 512.015 ms
(38 rows)


While if I apply this patch to disable the simplify_function():

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4..155bbd9fbc 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2636,15 +2636,6 @@ eval_const_expressions_mutator(Node *node,
                                 * Code for op/func reduction is pretty bulky, so split it out
                                 * as a separate function.
                                 */
-                               simple = simplify_function(expr->opfuncid,
-                                                                                  expr->opresulttype, -1,
-                                                                                  expr->opcollid,
-                                                                                  expr->inputcollid,
-                                                                                  &args,
-                                                                                  false,
-                                                                                  true,
-                                                                                  true,
-                                                                                  context);
                                if (simple)             /* successfully simplified it */
                                        return (Node *) simple;
 

Then we can get a more efficient query plan:

                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=120917.71..120917.76 rows=20 width=202) (actual time=336.255..344.250 rows=20 loops=1)
   ->  Sort  (cost=120917.71..120936.18 rows=7387 width=202) (actual time=336.254..344.248 rows=20 loops=1)
         Sort Key: (sum((lineitem.l_extendedprice * ((1)::numeric - lineitem.l_discount)))) DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Finalize GroupAggregate  (cost=119764.35..120721.15 rows=7387 width=202) (actual time=271.425..335.755 rows=37925 loops=1)
               Group Key: customer.c_custkey, nation.n_name
               ->  Gather Merge  (cost=119764.35..120567.25 rows=6156 width=202) (actual time=271.420..309.049 rows=37925 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=118764.33..118856.67 rows=3078 width=202) (actual time=267.897..296.298 rows=12642 loops=3)
                           Group Key: customer.c_custkey, nation.n_name
                           ->  Sort  (cost=118764.33..118772.02 rows=3078 width=182) (actual time=267.881..271.626 rows=38183 loops=3)
                                 Sort Key: customer.c_custkey, nation.n_name
                                 Sort Method: external merge  Disk: 7248kB
                                 Worker 0:  Sort Method: external merge  Disk: 7328kB
                                 Worker 1:  Sort Method: external merge  Disk: 7328kB
                                 ->  Hash Join  (cost=113641.60..118585.99 rows=3078 width=182) (actual time=222.247..249.682 rows=38183 loops=3)
                                       Hash Cond: (customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join  (cost=113640.04..118574.98 rows=3078 width=160) (actual time=222.183..243.657 rows=38183 loops=3)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..4219.00 rows=62500 width=148) (actual time=0.029..5.646 rows=50000 loops=3)
                                             ->  Parallel Hash  (cost=113601.56..113601.56 rows=3078 width=16) (actual time=221.947..221.948 rows=38183 loops=3)
                                                   Buckets: 131072 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 8064kB
                                                   ->  Nested Loop  (cost=0.43..113601.56 rows=3078 width=16) (actual time=0.096..204.897 rows=38183 loops=3)
                                                         ->  Parallel Seq Scan on orders  (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908 rows=19046 loops=3)
                                                               Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3 mons'::interval month)))
                                                               Rows Removed by Filter: 480954
                                                         ->  Index Scan using lineitem_pkey on lineitem  (cost=0.43..24.45 rows=4 width=16) (actual time=0.006..0.007 rows=2 loops=57138)
                                                               Index Cond: (l_orderkey = orders.o_orderkey)
                                                               Filter: (l_returnflag = 'R'::bpchar)
                                                               Rows Removed by Filter: 2
                                       ->  Hash  (cost=1.25..1.25 rows=25 width=30) (actual time=0.034..0.035 rows=25 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                             ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=30) (actual time=0.018..0.023 rows=25 loops=3)
 Planning Time: 1.207 ms
 Execution Time: 345.427 ms
(36 rows)


The estimated cost is reduced by 39.69%, and the execution time is reduced by 32.54%. I measured the execution time on average of 10 executions.
I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the simplify_function() function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize simplify_function() to make the performance better for this workload?



Environment:
I used 1 GB data of TPC-H benchmark, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing
The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpch"

tpch=# select version();
                                             version                                              
--------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)


Best regards,

Jinsheng Ba

 

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.


--

Best Regards 
Shiv Iyer 

pgsql-performance by date:

Previous
From: Ba Jinsheng
Date:
Subject: Unexpected Performance for the Function simplify_function
Next
From: Tom Lane
Date:
Subject: Re: Unexpected Performance for the Function simplify_function