Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost - Mailing list pgsql-bugs
From | 萧鸿骏 |
---|---|
Subject | Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost |
Date | |
Msg-id | d29dc8e.5be.196d412d59f.Coremail.23031212454@stu.xidian.edu.cn Whole thread Raw |
In response to | Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
|
List | pgsql-bugs |
Thank you very much for your reply! May I assume that we need to manually adjust the cost based on the actual situation ofthe system and testing computer? I have been conducting research on SQL performance issues in PG recently, and many of the problems I am currently studyingare related to it. I would like to consult with you. If the default cost value of the optimizer is not set reliably,resulting in the selection of a poor execution plan and significant performance differences, is this considereda direction for optimizer performance optimization? I think if we stand from the user's perspective, this may bedifficult to detect, leading to significant performance losses. — Regards, hongjun xiao 2025-05-15 16:45:03 "Dilip Kumar" <dilipbalaut@gmail.com> 写道: > On Wed, May 14, 2025 at 5:55 PM 萧鸿骏 <23031212454@stu.xidian.edu.cn> wrote: > > > > The following method can also trigger a better plan: > > > > database0=# set hash_mem_multiplier = 4; > > SET > > database0=# EXPLAIN ANALYZE SELECT t0.c0, t2.c0, t5.c0, t4.c0 FROM ONLY t0, t4 CROSS JOIN ONLY t5 JOIN ONLY t2 ON lower_inc(((t2.c0)*(t5.c0)))GROUP BY t0.c0, t2.c0, t5.c0, t4.c0; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------ > > HashAggregate (cost=8854.70..10204.70 rows=135000 width=53) (actual time=196.753..284.803 rows=184320 loops=1) > > Group Key: t0.c0, t2.c0, t5.c0, t4.c0 > > Batches: 5 Memory Usage: 16433kB Disk Usage: 11328kB > > -> Nested Loop (cost=0.00..7504.70 rows=135000 width=53) (actual time=0.046..52.982 rows=288000 loops=1) > > -> Nested Loop (cost=0.00..304.70 rows=4500 width=40) (actual time=0.039..3.358 rows=9600 loops=1) > > -> Nested Loop (cost=0.00..34.70 rows=180 width=27) (actual time=0.032..0.444 rows=384 loops=1) > > Join Filter: lower_inc((t2.c0 * t5.c0)) > > Rows Removed by Join Filter: 156 > > -> Seq Scan on t2 (cost=0.00..1.20 rows=20 width=14) (actual time=0.017..0.022 rows=20 loops=1) > > -> Seq Scan on t5 (cost=0.00..1.27 rows=27 width=13) (actual time=0.002..0.004 rows=27 loops=20) > > -> Seq Scan on t4 (cost=0.00..1.25 rows=25 width=13) (actual time=0.001..0.004 rows=25 loops=384) > > -> Seq Scan on t0 (cost=0.00..1.30 rows=30 width=13) (actual time=0.001..0.002 rows=30 loops=9600) > > Planning Time: 0.109 ms > > Execution Time: 291.546 ms > > (14 rows) > > By setting a higher value for hash_mem_multiplier, you're essentially > informing the planner that more memory is available for hash-based > operations. During cost estimation, the planner compares the costs of > HashAggregate versus GroupAggregate. With a higher memory limit for > building the hash table, the planner determines that HashAggregate is > more cost-effective and therefore selects it over GroupAggregate. > > However, during further investigation, I hadn't increased > hash_mem_multiplier. Instead, I forced the use of hash aggregation by > disabling other options with SET enable_incremental_sort = off; and > SET enable_sort = off;. Even without adjusting hash_mem_multiplier, > the HashAggregate was significantly cheaper than GroupAggregate. Are > these points that we are overestimating the cost of the hash > aggregate? Not sure, might need to dig further down, but one thing to > note is that the planner cost shows a hash aggregate with 2x costlier > than the group aggregate[1][2], whereas the actual execution shows > that the hash aggregate is 4x faster than the group aggregate. > > Another point to be noted is that as soon as we set the > hash_mem_multiplier to 4, then the HashAggregate planner cost drops > from 20k to 4k [3], then I suspect this could be related to we are > overestimating for the disk access as our default random page cost is > quite high and might not be suited well for the faster disk, and as > soon as I changed the random_page_cost to 1 it chose HashAggregate by > default. > > Summary: IMHO, this doesn't look like an issue, instead, we just need > to adjust the random page cost, which is more suited for the system > under test. > > [1] > Explain, analyze with grouping, aggregate: > Group (cost=332.55..13010.57 rows=135000 width=53) (actual > time=104.887..2426.446 rows=184320.00 loops=1) > Group Key: t0.c0, t2.c0, t5.c0, t4.c0 > Buffers: shared hit=8 > -> Incremental Sort (cost=332.55..11660.57 rows=135000 width=53) > (actual time=104.884..2222.501 rows=288000.00 loops=1) > Sort Key: t0.c0, t2.c0, t5.c0, t4.c0 > Presorted Key: t0.c0 > Full-sort Groups: 30 Sort Method: quicksort Average Memory: > 29kB Peak Memory: 29kB > Pre-sorted Groups: 30 Sort Method: quicksort Average > Memory: 1051kB Peak Memory: 1054kB > Buffers: shared hit=8 > -> Nested Loop (cost=0.14..1780.87 rows=135000 width=53) > (actual time=0.223..77.066 rows=288000.00 loops=1) > Buffers: shared hit=5 > -> Index Only Scan using i0 on t0 (cost=0.14..12.59 > rows=30 width=13) (actual time=0.075..0.217 rows=30.00 loops=1) > Heap Fetches: 30 > Index Searches: 1 > Buffers: shared hit=2 > -> Materialize (cost=0.00..92.03 rows=4500 width=40) > (actual time=0.005..0.819 rows=9600.00 loops=30) > Storage: Memory Maximum Storage: 873kB > Buffers: shared hit=3 > -> Nested Loop (cost=0.00..69.53 rows=4500 > width=40) (actual time=0.132..4.110 rows=9600.00 loops=1) > Buffers: shared hit=3 > -> Nested Loop (cost=0.00..11.97 rows=180 > width=27) (actual time=0.108..0.812 rows=384.00 loops=1) > Join Filter: lower_inc((t2.c0 * t5.c0)) > Rows Removed by Join Filter: 156 > Buffers: shared hit=2 > -> Seq Scan on t5 (cost=0.00..1.27 > rows=27 width=13) (actual time=0.026..0.034 rows=27.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.30 > rows=20 width=14) (actual time=0.001..0.003 rows=20.00 loops=27) > Storage: Memory Maximum Storage: 17kB > Buffers: shared hit=1 > -> Seq Scan on t2 > (cost=0.00..1.20 rows=20 width=14) (actual time=0.009..0.013 > rows=20.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.38 rows=25 > width=13) (actual time=0.000..0.002 rows=25.00 loops=384) > Storage: Memory Maximum Storage: 18kB > Buffers: shared hit=1 > -> Seq Scan on t4 (cost=0.00..1.25 > rows=25 width=13) (actual time=0.019..0.023 rows=25.00 loops=1) > Buffers: shared hit=1 > Planning: > Buffers: shared hit=104 > Planning Time: 2.162 ms > Execution Time: 2449.420 ms > > [2] > Explain, analyze with Hash aggregate: (SET enable_incremental_sort = > off; and SET enable_sort = off;) > > HashAggregate (cost=16355.28..20342.00 rows=135000 width=53) (actual > time=485.060..690.215 rows=184320.00 loops=1) > Group Key: t0.c0, t2.c0, t5.c0, t4.c0 > Planned Partitions: 4 Batches: 5 Memory Usage: 8249kB Disk Usage: 15568kB > Buffers: shared hit=4, temp read=1739 written=3234 > -> Nested Loop (cost=0.00..1758.41 rows=135000 width=53) (actual > time=0.191..56.042 rows=288000.00 loops=1) > Buffers: shared hit=4 > -> Nested Loop (cost=0.00..69.53 rows=4500 width=40) > (actual time=0.164..2.533 rows=9600.00 loops=1) > Buffers: shared hit=3 > -> Nested Loop (cost=0.00..11.97 rows=180 width=27) > (actual time=0.127..0.602 rows=384.00 loops=1) > Join Filter: lower_inc((t2.c0 * t5.c0)) > Rows Removed by Join Filter: 156 > Buffers: shared hit=2 > -> Seq Scan on t5 (cost=0.00..1.27 rows=27 > width=13) (actual time=0.035..0.056 rows=27.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.30 rows=20 > width=14) (actual time=0.001..0.003 rows=20.00 loops=27) > Storage: Memory Maximum Storage: 17kB > Buffers: shared hit=1 > -> Seq Scan on t2 (cost=0.00..1.20 > rows=20 width=14) (actual time=0.011..0.015 rows=20.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.38 rows=25 width=13) > (actual time=0.000..0.001 rows=25.00 loops=384) > Storage: Memory Maximum Storage: 18kB > Buffers: shared hit=1 > -> Seq Scan on t4 (cost=0.00..1.25 rows=25 > width=13) (actual time=0.026..0.031 rows=25.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.45 rows=30 width=13) (actual > time=0.000..0.001 rows=30.00 loops=9600) > Storage: Memory Maximum Storage: 18kB > Buffers: shared hit=1 > -> Seq Scan on t0 (cost=0.00..1.30 rows=30 width=13) > (actual time=0.019..0.024 rows=30.00 loops=1) > Buffers: shared hit=1 > Planning: > Buffers: shared hit=103 > Planning Time: 2.129 ms > Execution Time: 698.681 ms > > [3] > HashAggregate (cost=3108.41..4458.41 rows=135000 width=53) (actual > time=489.975..612.891 rows=184320.00 loops=1) > Group Key: t0.c0, t2.c0, t5.c0, t4.c0 > Batches: 5 Memory Usage: 16441kB Disk Usage: 7504kB > Buffers: shared hit=4, temp read=790 written=1523 > -> Nested Loop (cost=0.00..1758.41 rows=135000 width=53) (actual > time=0.334..55.897 rows=288000.00 loops=1) > Buffers: shared hit=4 > -> Nested Loop (cost=0.00..69.53 rows=4500 width=40) > (actual time=0.302..2.675 rows=9600.00 loops=1) > Buffers: shared hit=3 > -> Nested Loop (cost=0.00..11.97 rows=180 width=27) > (actual time=0.254..0.699 rows=384.00 loops=1) > Join Filter: lower_inc((t2.c0 * t5.c0)) > Rows Removed by Join Filter: 156 > Buffers: shared hit=2 > -> Seq Scan on t5 (cost=0.00..1.27 rows=27 > width=13) (actual time=0.111..0.121 rows=27.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.30 rows=20 > width=14) (actual time=0.002..0.004 rows=20.00 loops=27) > Storage: Memory Maximum Storage: 17kB > Buffers: shared hit=1 > -> Seq Scan on t2 (cost=0.00..1.20 > rows=20 width=14) (actual time=0.033..0.039 rows=20.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.38 rows=25 width=13) > (actual time=0.000..0.001 rows=25.00 loops=384) > Storage: Memory Maximum Storage: 18kB > Buffers: shared hit=1 > -> Seq Scan on t4 (cost=0.00..1.25 rows=25 > width=13) (actual time=0.035..0.053 rows=25.00 loops=1) > Buffers: shared hit=1 > -> Materialize (cost=0.00..1.45 rows=30 width=13) (actual > time=0.000..0.001 rows=30.00 loops=9600) > Storage: Memory Maximum Storage: 18kB > Buffers: shared hit=1 > -> Seq Scan on t0 (cost=0.00..1.30 rows=30 width=13) > (actual time=0.023..0.030 rows=30.00 loops=1) > Buffers: shared hit=1 > Planning Time: 0.763 ms > Execution Time: 622.127 ms > (31 rows) > > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com
pgsql-bugs by date: