Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs

From Alena Rybakina
Subject Re: Performance Issue on Query 18 of TPC-H Benchmark
Date
Msg-id 955eefa0-2158-4604-b79b-68a087646e91@postgrespro.ru
Whole thread Raw
In response to Re: Performance Issue on Query 18 of TPC-H Benchmark  (Ba Jinsheng <bajinsheng@u.nus.edu>)
Responses Re: Performance Issue on Query 18 of TPC-H Benchmark
List pgsql-bugs


On 16.10.2024 17:56, Ba Jinsheng wrote:
P {margin-top:0;margin-bottom:0;}
>I would like to know if you can improve that case by switching from the sorted group to a hashed one.

I used this patch to enable the first HashAggregate only in the query plan:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0c7273b9cc..b410452df1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6983,8 +6983,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
        bool            can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
        List       *havingQual = (List *) extra->havingQual;
        AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
-
-       if (can_sort)
+       static int call_count = 0;
+       call_count++;
+       if (can_sort && call_count != 2)
        {
                /*
                 * Use any available suitably-sorted path as input, and also consider

And got this query plan:
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=693083.48..693083.73 rows=100 width=71) (actual time=2624.282..2629.392 rows=9 loops=1)
   ->  Sort  (cost=693083.48..694352.29 rows=507522 width=71) (actual time=2624.281..2629.390 rows=9 loops=1)
         Sort Key: orders.o_totalprice DESC, orders.o_orderdate
         Sort Method: quicksort  Memory: 25kB
         ->  HashAggregate  (cost=658421.05..673686.36 rows=507522 width=71) (actual time=2624.162..2629.346 rows=9 loops=1)
               Group Key: customer.c_custkey, orders.o_orderkey
               Planned Partitions: 32  Batches: 1  Memory Usage: 793kB
               ->  Gather  (cost=459569.18..608779.05 rows=507522 width=44) (actual time=2623.805..2629.229 rows=63 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Nested Loop  (cost=458569.18..557026.85 rows=211468 width=44) (actual time=2581.717..2620.494 rows=21 loops=3)
                           ->  Parallel Hash Join  (cost=458568.75..492734.09 rows=52844 width=43) (actual time=2581.704..2620.448 rows=3 loops=3)
                                 Hash Cond: (orders.o_custkey = customer.c_custkey)
                                 ->  Hash Join  (cost=453562.50..487589.13 rows=52844 width=24) (actual time=2541.024..2579.759 rows=3 loops=3)
                                       Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
                                       ->  Parallel Seq Scan on orders  (cost=0.00..32386.00 rows=625000 width=20) (actual time=0.028..32.135 rows=500000 loops=3)
                                       ->  Hash  (cost=451977.19..451977.19 rows=126825 width=4) (actual time=2515.787..2515.788 rows=9 loops=3)
                                             Buckets: 131072  Batches: 1  Memory Usage: 1025kB
                                             ->  GroupAggregate  (cost=0.43..451977.19 rows=126825 width=4) (actual time=608.052..2515.758 rows=9 loops=3)
                                                   Group Key: lineitem_1.l_orderkey
                                                   Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
                                                   Rows Removed by Filter: 1499991
                                                   ->  Index Scan using lineitem_pkey on lineitem lineitem_1  (cost=0.43..416256.96 rows=6002623 width=9) (actual time=0.043..1399.708 rows=6001215 loops=3)
                                 ->  Parallel Hash  (cost=4225.00..4225.00 rows=62500 width=23) (actual time=39.601..39.602 rows=50000 loops=3)
                                       Buckets: 262144  Batches: 1  Memory Usage: 10304kB
                                       ->  Parallel Seq Scan on customer  (cost=0.00..4225.00 rows=62500 width=23) (actual time=0.032..15.561 rows=50000 loops=3)
                           ->  Index Scan using lineitem_pkey on lineitem  (cost=0.43..1.06 rows=16 width=9) (actual time=0.012..0.014 rows=7 loops=9)
                                 Index Cond: (l_orderkey = orders.o_orderkey)
 Planning Time: 1.850 ms
 Execution Time: 2630.023 ms
(30 rows)

Compared to the query plan with GroupAggregate, both estimated cost and execution time are similar and have no significant difference.


To be honest, I don't quite understand what you are showing. I believe you are not allowing the optimizer to generate a different aggregation path (Group Aggregate) because it requires a sort operation. So I think this is not correct.

>I think it was more interesting when I turned off parallelism and tried to build a query plan without AQO, and the execution time there was significantly reduced:
Turning off parallelism only brings this significant performance improvement?


You may notice that disabling parallelism results in improved cardinality estimation and therefore a better query plan, since the optimizer selects paths based on their cost. If parallelism is disabled, query plan have become more correct.
-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18658: Assert in SerialAdd() due to race condition
Next
From: Fujii Masao
Date:
Subject: Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity