Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs
From | Alena Rybakina |
---|---|
Subject | Re: Performance Issue on Query 18 of TPC-H Benchmark |
Date | |
Msg-id | 45323a23-47d9-46f5-8e1b-8f0c961a4fad@postgrespro.ru Whole thread Raw |
In response to | Re: Performance Issue on Query 18 of TPC-H Benchmark (Andrei Lepikhov <lepihov@gmail.com>) |
Responses |
Re: Performance Issue on Query 18 of TPC-H Benchmark
|
List | pgsql-bugs |
Hi!
On 10/16/24 01:28, Ba Jinsheng wrote:The major differerence between both query plans is the first one has additional *SORT*. I believe the second query plan is more efficient. Similar to my last report, perhaps we can optimize code to enable it.I would like to know if you can improve that case by switching from the sorted group to a hashed one.
I see huge underestimation because of the HAVING clause on an aggregate. It would be interesting to correct the prediction and observe what will happen.
Unfortunately my laptop is not so fast, so my execution time is 25822.899 ms.
Initial query plan on my laptop is the same:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=668833.97..668834.22 rows=100 width=71) (actual time=25814.870..25822.461 rows=9 loops=1)
-> Sort (cost=668833.97..670117.12 rows=513262 width=71) (actual time=25814.867..25822.454 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=584343.41..649217.47 rows=513262 width=71) (actual time=25814.736..25822.399 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Gather Merge (cost=584343.41..638524.51 rows=427718 width=71) (actual time=25814.716..25822.328 rows=9 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=583343.39..588155.22 rows=213859 width=71) (actual time=25792.566..25792.609 rows=3 loops=3)
Group Key: customer.c_custkey, orders.o_orderkey
-> Sort (cost=583343.39..583878.04 rows=213859 width=44) (actual time=25792.512..25792.527 rows=21 loops=3)
Sort Key: customer.c_custkey, orders.o_orderkey
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 26kB
Worker 1: Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=458633.58..557830.13 rows=213859 width=44) (actual time=25480.327..25792.187 rows=21 loops=3)
-> Parallel Hash Join (cost=458633.15..492800.08 rows=53450 width=43) (actual time=25480.163..25791.800 rows=3 loops=3)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Hash Join (cost=453626.90..487653.53 rows=53450 width=24) (actual time=25367.158..25677.611 rows=3 loops=3)
Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
-> Parallel Seq Scan on orders (cost=0.00..32386.00 rows=625000 width=20) (actual time=0.110..237.788 rows=500000 loops=3)
-> Hash (cost=452023.40..452023.40 rows=128280 width=4) (actual time=25157.711..25157.713 rows=9 loops=3)
Buckets: 131072 Batches: 1 Memory Usage: 1025kB
-> GroupAggregate (cost=0.43..452023.40 rows=128280 width=4) (actual time=5918.735..25157.610 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1 (cost=0.43..416242.51 rows=6001659 width=9) (actual time=0.186..11092.476 rows=6001215 loops=3)
-> Parallel Hash (cost=4225.00..4225.00 rows=62500 width=23) (actual time=110.867..110.868 rows=50000 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 10304kB
-> Parallel Seq Scan on customer (cost=0.00..4225.00 rows=62500 width=23) (actual time=0.071..49.232 rows=50000 loops=3)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.096..0.109 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 5.025 ms
Execution Time: 25822.899 ms
(36 rows)
To get a better plan, I ran the query with the AQO [0] extension and it cut the execution time in half. Unfortunately, AQO doesn't remember the power of hash nodes, and to be honest, I set the power to the correct power in the code.
I know this is the wrong way, but I haven't had enough time to fix it properly. And so here is my query plan:
I also disabled parallelism and my query plan like this:
set max_parallel_maintenance_workers =0;
set max_parallel_workers=0;
set max_parallel_workers_per_gather =0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=451107.39..451107.48 rows=36 width=71) (actual time=12582.114..12582.136 rows=9 loops=1)
-> Sort (cost=451107.39..451107.48 rows=36 width=71) (actual time=12582.112..12582.118 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=451105.65..451106.46 rows=36 width=71) (actual time=12581.981..12582.082 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Sort (cost=451105.65..451105.74 rows=36 width=44) (actual time=12581.951..12581.969 rows=63 loops=1)
Sort Key: customer.c_custkey, orders.o_orderkey
Sort Method: quicksort Memory: 28kB
-> Nested Loop (cost=1.71..451104.72 rows=36 width=44) (actual time=2704.881..12581.823 rows=63 loops=1)
-> Nested Loop (cost=1.28..451093.77 rows=9 width=43) (actual time=2704.826..12581.457 rows=9 loops=1)
-> Nested Loop (cost=0.86..451089.74 rows=9 width=24) (actual time=2704.741..12581.024 rows=9 loops=1)
-> GroupAggregate (cost=0.43..451013.73 rows=9 width=4) (actual time=2697.354..12522.126 rows=9 loops=1)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1 (cost=0.43..416226.85 rows=6000615 width=9) (actual time=0.030..5515.665 rows=6001215 loops=1)
-> Index Scan using orders_pkey on orders (cost=0.43..8.45 rows=1 width=20) (actual time=6.530..6.530 rows=1 loops=9)
Index Cond: (o_orderkey = lineitem_1.l_orderkey)
-> Index Scan using customer_pkey on customer (cost=0.42..0.45 rows=1 width=23) (actual time=0.039..0.039 rows=1 loops=9)
Index Cond: (c_custkey = orders.o_custkey)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.026..0.033 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 1.403 ms
Execution Time: 12582.321 ms
(25 rows)
[0] https://github.com/postgrespro/aqo
I think it was more interesting when I turned off parallelism and tried to build a query plan without AQO, and the execution time there was significantly reduced:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=722284.69..722284.94 rows=100 width=71) (actual time=16251.401..16251.413 rows=9 loops=1)
-> Sort (cost=722284.69..723567.84 rows=513262 width=71) (actual time=16251.399..16251.408 rows=9 loops=1)
Sort Key: orders.o_totalprice DESC, orders.o_orderdate
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=512218.34..702668.18 rows=513262 width=71) (actual time=16227.765..16251.371 rows=9 loops=1)
Group Key: customer.c_custkey, orders.o_orderkey
-> Incremental Sort (cost=512218.34..692402.94 rows=513262 width=44) (actual time=16227.735..16251.241 rows=63 loops=1)
Sort Key: customer.c_custkey, orders.o_orderkey
Presorted Key: customer.c_custkey
Full-sort Groups: 2 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB
-> Nested Loop (cost=512217.20..678432.66 rows=513262 width=44) (actual time=16094.663..16251.114 rows=63 loops=1)
-> Merge Join (cost=512216.77..522360.54 rows=128280 width=43) (actual time=16094.596..16250.531 rows=9 loops=1)
Merge Cond: (customer.c_custkey = orders.o_custkey)
-> Index Scan using customer_pkey on customer (cost=0.42..7524.45 rows=150000 width=23) (actual time=0.039..148.356 rows=147198 loops=1)
-> Materialize (cost=512216.29..512857.69 rows=128280 width=24) (actual time=16068.493..16068.529 rows=9 loops=1)
-> Sort (cost=512216.29..512536.99 rows=128280 width=24) (actual time=16068.478..16068.496 rows=9 loops=1)
Sort Key: orders.o_custkey
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=453626.90..498700.41 rows=128280 width=24) (actual time=15309.799..16068.390 rows=9 loops=1)
Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
-> Seq Scan on orders (cost=0.00..41136.00 rows=1500000 width=20) (actual time=0.011..439.488 rows=1500000 loops=1)
-> Hash (cost=452023.40..452023.40 rows=128280 width=4) (actual time=15104.560..15104.562 rows=9 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 1025kB
-> GroupAggregate (cost=0.43..452023.40 rows=128280 width=4) (actual time=3731.232..15104.495 rows=9 loops=1)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed by Filter: 1499991
-> Index Scan using lineitem_pkey on lineitem lineitem_1 (cost=0.43..416242.51 rows=6001659 width=9) (actual time=0.034..6936.304 rows=6001215 loops=1)
-> Index Scan using lineitem_pkey on lineitem (cost=0.43..1.06 rows=16 width=9) (actual time=0.045..0.053 rows=7 loops=9)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning Time: 1.782 ms
Execution Time: 16251.698 ms
(32 rows)
Looking at the code, I think the problem with cost estimation in aggregation nodes is caused by incorrect selectivity estimation (cost_agg function) and can be solved using advanced statistics. But to be honest, I'm still investigating this.
-- Regards, Alena Rybakina Postgres Professional
pgsql-bugs by date: