Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost - Mailing list pgsql-bugs

From 萧鸿骏
Subject Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
Date
Msg-id d29dc8e.5be.196d412d59f.Coremail.23031212454@stu.xidian.edu.cn
Whole thread Raw
In response to Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
List pgsql-bugs
Thank you very much for your reply! May I assume that we need to manually adjust the cost based on the actual situation
ofthe system and testing computer? 

I have been conducting research on SQL performance issues in PG recently, and many of the problems I am currently
studyingare related to it. I would like to consult with you. If the default cost value of the optimizer is not set
reliably,resulting in the selection of a poor execution plan and significant performance differences, is this
considereda direction for optimizer performance optimization? I think if we stand from the user's perspective, this may
bedifficult to detect, leading to significant performance losses. 

—
Regards,
hongjun xiao

2025-05-15 16:45:03 "Dilip Kumar" <dilipbalaut@gmail.com> 写道:
> On Wed, May 14, 2025 at 5:55 PM 萧鸿骏 <23031212454@stu.xidian.edu.cn> wrote:
> >
> > The following method can also trigger a better plan:
> >
> > database0=# set hash_mem_multiplier = 4;
> > SET
> > database0=# EXPLAIN ANALYZE SELECT t0.c0, t2.c0, t5.c0, t4.c0 FROM ONLY t0, t4 CROSS JOIN ONLY t5 JOIN ONLY t2 ON
lower_inc(((t2.c0)*(t5.c0)))GROUP BY t0.c0, t2.c0, t5.c0, t4.c0;
 
> >                                                        QUERY PLAN
> >
------------------------------------------------------------------------------------------------------------------------
> >  HashAggregate  (cost=8854.70..10204.70 rows=135000 width=53) (actual time=196.753..284.803 rows=184320 loops=1)
> >    Group Key: t0.c0, t2.c0, t5.c0, t4.c0
> >    Batches: 5  Memory Usage: 16433kB  Disk Usage: 11328kB
> >    ->  Nested Loop  (cost=0.00..7504.70 rows=135000 width=53) (actual time=0.046..52.982 rows=288000 loops=1)
> >          ->  Nested Loop  (cost=0.00..304.70 rows=4500 width=40) (actual time=0.039..3.358 rows=9600 loops=1)
> >                ->  Nested Loop  (cost=0.00..34.70 rows=180 width=27) (actual time=0.032..0.444 rows=384 loops=1)
> >                      Join Filter: lower_inc((t2.c0 * t5.c0))
> >                      Rows Removed by Join Filter: 156
> >                      ->  Seq Scan on t2  (cost=0.00..1.20 rows=20 width=14) (actual time=0.017..0.022 rows=20
loops=1)
> >                      ->  Seq Scan on t5  (cost=0.00..1.27 rows=27 width=13) (actual time=0.002..0.004 rows=27
loops=20)
> >                ->  Seq Scan on t4  (cost=0.00..1.25 rows=25 width=13) (actual time=0.001..0.004 rows=25 loops=384)
> >          ->  Seq Scan on t0  (cost=0.00..1.30 rows=30 width=13) (actual time=0.001..0.002 rows=30 loops=9600)
> >  Planning Time: 0.109 ms
> >  Execution Time: 291.546 ms
> > (14 rows)
> 
> By setting a higher value for hash_mem_multiplier, you're essentially
> informing the planner that more memory is available for hash-based
> operations. During cost estimation, the planner compares the costs of
> HashAggregate versus GroupAggregate. With a higher memory limit for
> building the hash table, the planner determines that HashAggregate is
> more cost-effective and therefore selects it over GroupAggregate.
> 
> However, during further investigation, I hadn't increased
> hash_mem_multiplier. Instead, I forced the use of hash aggregation by
> disabling other options with SET enable_incremental_sort = off; and
> SET enable_sort = off;. Even without adjusting hash_mem_multiplier,
> the HashAggregate was significantly cheaper than GroupAggregate.  Are
> these points that we are overestimating the cost of the hash
> aggregate?  Not sure, might need to dig further down, but one thing to
> note is that the planner cost shows a hash aggregate with 2x costlier
> than the group aggregate[1][2], whereas the actual execution shows
> that the hash aggregate is 4x faster than the group aggregate.
> 
> Another point to be noted is that as soon as we set the
> hash_mem_multiplier  to 4, then the HashAggregate planner cost drops
> from 20k to 4k [3], then I suspect this could be related to we are
> overestimating for the disk access as our default random page cost is
> quite high and might not be suited well for the faster disk, and as
> soon as I changed the random_page_cost to 1 it chose HashAggregate by
> default.
> 
> Summary: IMHO, this doesn't look like an issue, instead, we just need
> to adjust the random page cost, which is more suited for the system
> under test.
> 
> [1]
> Explain, analyze with grouping, aggregate:
>  Group  (cost=332.55..13010.57 rows=135000 width=53) (actual
> time=104.887..2426.446 rows=184320.00 loops=1)
>    Group Key: t0.c0, t2.c0, t5.c0, t4.c0
>    Buffers: shared hit=8
>    ->  Incremental Sort  (cost=332.55..11660.57 rows=135000 width=53)
> (actual time=104.884..2222.501 rows=288000.00 loops=1)
>          Sort Key: t0.c0, t2.c0, t5.c0, t4.c0
>          Presorted Key: t0.c0
>          Full-sort Groups: 30  Sort Method: quicksort  Average Memory:
> 29kB  Peak Memory: 29kB
>          Pre-sorted Groups: 30  Sort Method: quicksort  Average
> Memory: 1051kB  Peak Memory: 1054kB
>          Buffers: shared hit=8
>          ->  Nested Loop  (cost=0.14..1780.87 rows=135000 width=53)
> (actual time=0.223..77.066 rows=288000.00 loops=1)
>                Buffers: shared hit=5
>                ->  Index Only Scan using i0 on t0  (cost=0.14..12.59
> rows=30 width=13) (actual time=0.075..0.217 rows=30.00 loops=1)
>                      Heap Fetches: 30
>                      Index Searches: 1
>                      Buffers: shared hit=2
>                ->  Materialize  (cost=0.00..92.03 rows=4500 width=40)
> (actual time=0.005..0.819 rows=9600.00 loops=30)
>                      Storage: Memory  Maximum Storage: 873kB
>                      Buffers: shared hit=3
>                      ->  Nested Loop  (cost=0.00..69.53 rows=4500
> width=40) (actual time=0.132..4.110 rows=9600.00 loops=1)
>                            Buffers: shared hit=3
>                            ->  Nested Loop  (cost=0.00..11.97 rows=180
> width=27) (actual time=0.108..0.812 rows=384.00 loops=1)
>                                  Join Filter: lower_inc((t2.c0 * t5.c0))
>                                  Rows Removed by Join Filter: 156
>                                  Buffers: shared hit=2
>                                  ->  Seq Scan on t5  (cost=0.00..1.27
> rows=27 width=13) (actual time=0.026..0.034 rows=27.00 loops=1)
>                                        Buffers: shared hit=1
>                                  ->  Materialize  (cost=0.00..1.30
> rows=20 width=14) (actual time=0.001..0.003 rows=20.00 loops=27)
>                                        Storage: Memory  Maximum Storage: 17kB
>                                        Buffers: shared hit=1
>                                        ->  Seq Scan on t2
> (cost=0.00..1.20 rows=20 width=14) (actual time=0.009..0.013
> rows=20.00 loops=1)
>                                              Buffers: shared hit=1
>                            ->  Materialize  (cost=0.00..1.38 rows=25
> width=13) (actual time=0.000..0.002 rows=25.00 loops=384)
>                                  Storage: Memory  Maximum Storage: 18kB
>                                  Buffers: shared hit=1
>                                  ->  Seq Scan on t4  (cost=0.00..1.25
> rows=25 width=13) (actual time=0.019..0.023 rows=25.00 loops=1)
>                                        Buffers: shared hit=1
>  Planning:
>    Buffers: shared hit=104
>  Planning Time: 2.162 ms
>  Execution Time: 2449.420 ms
> 
> [2]
> Explain, analyze with Hash aggregate: (SET enable_incremental_sort =
> off; and SET enable_sort = off;)
> 
> HashAggregate  (cost=16355.28..20342.00 rows=135000 width=53) (actual
> time=485.060..690.215 rows=184320.00 loops=1)
>    Group Key: t0.c0, t2.c0, t5.c0, t4.c0
>    Planned Partitions: 4  Batches: 5  Memory Usage: 8249kB  Disk Usage: 15568kB
>    Buffers: shared hit=4, temp read=1739 written=3234
>    ->  Nested Loop  (cost=0.00..1758.41 rows=135000 width=53) (actual
> time=0.191..56.042 rows=288000.00 loops=1)
>          Buffers: shared hit=4
>          ->  Nested Loop  (cost=0.00..69.53 rows=4500 width=40)
> (actual time=0.164..2.533 rows=9600.00 loops=1)
>                Buffers: shared hit=3
>                ->  Nested Loop  (cost=0.00..11.97 rows=180 width=27)
> (actual time=0.127..0.602 rows=384.00 loops=1)
>                      Join Filter: lower_inc((t2.c0 * t5.c0))
>                      Rows Removed by Join Filter: 156
>                      Buffers: shared hit=2
>                      ->  Seq Scan on t5  (cost=0.00..1.27 rows=27
> width=13) (actual time=0.035..0.056 rows=27.00 loops=1)
>                            Buffers: shared hit=1
>                      ->  Materialize  (cost=0.00..1.30 rows=20
> width=14) (actual time=0.001..0.003 rows=20.00 loops=27)
>                            Storage: Memory  Maximum Storage: 17kB
>                            Buffers: shared hit=1
>                            ->  Seq Scan on t2  (cost=0.00..1.20
> rows=20 width=14) (actual time=0.011..0.015 rows=20.00 loops=1)
>                                  Buffers: shared hit=1
>                ->  Materialize  (cost=0.00..1.38 rows=25 width=13)
> (actual time=0.000..0.001 rows=25.00 loops=384)
>                      Storage: Memory  Maximum Storage: 18kB
>                      Buffers: shared hit=1
>                      ->  Seq Scan on t4  (cost=0.00..1.25 rows=25
> width=13) (actual time=0.026..0.031 rows=25.00 loops=1)
>                            Buffers: shared hit=1
>          ->  Materialize  (cost=0.00..1.45 rows=30 width=13) (actual
> time=0.000..0.001 rows=30.00 loops=9600)
>                Storage: Memory  Maximum Storage: 18kB
>                Buffers: shared hit=1
>                ->  Seq Scan on t0  (cost=0.00..1.30 rows=30 width=13)
> (actual time=0.019..0.024 rows=30.00 loops=1)
>                      Buffers: shared hit=1
>  Planning:
>    Buffers: shared hit=103
>  Planning Time: 2.129 ms
>  Execution Time: 698.681 ms
> 
> [3]
>  HashAggregate  (cost=3108.41..4458.41 rows=135000 width=53) (actual
> time=489.975..612.891 rows=184320.00 loops=1)
>    Group Key: t0.c0, t2.c0, t5.c0, t4.c0
>    Batches: 5  Memory Usage: 16441kB  Disk Usage: 7504kB
>    Buffers: shared hit=4, temp read=790 written=1523
>    ->  Nested Loop  (cost=0.00..1758.41 rows=135000 width=53) (actual
> time=0.334..55.897 rows=288000.00 loops=1)
>          Buffers: shared hit=4
>          ->  Nested Loop  (cost=0.00..69.53 rows=4500 width=40)
> (actual time=0.302..2.675 rows=9600.00 loops=1)
>                Buffers: shared hit=3
>                ->  Nested Loop  (cost=0.00..11.97 rows=180 width=27)
> (actual time=0.254..0.699 rows=384.00 loops=1)
>                      Join Filter: lower_inc((t2.c0 * t5.c0))
>                      Rows Removed by Join Filter: 156
>                      Buffers: shared hit=2
>                      ->  Seq Scan on t5  (cost=0.00..1.27 rows=27
> width=13) (actual time=0.111..0.121 rows=27.00 loops=1)
>                            Buffers: shared hit=1
>                      ->  Materialize  (cost=0.00..1.30 rows=20
> width=14) (actual time=0.002..0.004 rows=20.00 loops=27)
>                            Storage: Memory  Maximum Storage: 17kB
>                            Buffers: shared hit=1
>                            ->  Seq Scan on t2  (cost=0.00..1.20
> rows=20 width=14) (actual time=0.033..0.039 rows=20.00 loops=1)
>                                  Buffers: shared hit=1
>                ->  Materialize  (cost=0.00..1.38 rows=25 width=13)
> (actual time=0.000..0.001 rows=25.00 loops=384)
>                      Storage: Memory  Maximum Storage: 18kB
>                      Buffers: shared hit=1
>                      ->  Seq Scan on t4  (cost=0.00..1.25 rows=25
> width=13) (actual time=0.035..0.053 rows=25.00 loops=1)
>                            Buffers: shared hit=1
>          ->  Materialize  (cost=0.00..1.45 rows=30 width=13) (actual
> time=0.000..0.001 rows=30.00 loops=9600)
>                Storage: Memory  Maximum Storage: 18kB
>                Buffers: shared hit=1
>                ->  Seq Scan on t0  (cost=0.00..1.30 rows=30 width=13)
> (actual time=0.023..0.030 rows=30.00 loops=1)
>                      Buffers: shared hit=1
>  Planning Time: 0.763 ms
>  Execution Time: 622.127 ms
> (31 rows)
> 
> 
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18932: PostgreSQL fails to handle EXCEPT operation when the left branch is an Empty Table
Next
From: "David G. Johnston"
Date:
Subject: Re: BUG #18932: PostgreSQL fails to handle EXCEPT operation when the left branch is an Empty Table