On 3/22/22 13:57, Prajna Shetty wrote:
> ++ _pgsql-performance@postgresql.org_
> <https://www.postgresql.org/list/pgsql-performance/>
>
> Hello Team,
>
> There is change in query plan in 12.4 version and Version 13 resulting
> in performance slowness post upgrade.
>
>
> * In 12.4 version, Sort Operation Group Aggregate is selected which
> results to Merge Join. Query takes ~5 seconds.
> * In 13.5 version, optimizer wrongly estimates and due to new Disk
> Based Hash Aggregate feature, it prefers Hash Aggregate instead of
> Sort Operation which finally blocks merge-join and chooses Nested
> Loop Left Join. Query takes ~5 minutes.
>
>
> *_NOTE: _*Disabling Hash Aggregate on instance level forces optimizer to
> choose merge operation but such instance level modification is not
> possible in terms of Application Functionality.
>
> This performance issue is on all over most of queries. Attached one of
> the query and its plan in both version for reference in case that helps
> for recreating the issue.
>
It's impossible to comment those other queries, but chances are the root
cause is the same.
> Version 13 query plan has lower estimated cost than that of 12.4 which
> implies 13.5 planner thought it found a better plan, but it is running
> slower and actual cost show more.
>
> 12.4 Version:
> "Merge Right Join (cost=*202198.78..295729.10* rows=1 width=8) (actual
> time=1399.727..*5224.574* rows=296 loops=1)"
>
> 13.5 version:-
> "Nested Loop Left Join (cost=*196360.90..287890.45* rows=1 width=8)
> (actual time=3209.577..*371300.693* rows=296 loops=1)"
>
This is not a costing issue, the problem is that we expect 1 row and
calculate the cost for that, but then get 296. And unfortunately a
nested loop degrades much faster than a merge join.
I'm not sure why exactly 12.4 picked a merge join, chances are the
costing formular changed a bit somewhere. But as I said, the problem is
in bogus row cardinality estimates - 12.4 is simply lucky.
The problem most likely stems from this part:
-> GroupAggregate (cost=0.43..85743.24 rows=1830 width=72) (actual
time=1.621..3452.034 rows=282179 loops=3)
Group Key: student_class_detail.aamc_id
Filter: (max((student_class_detail.class_level_cd)::text) = '4'::text)
Rows Removed by Filter: 76060
-> Index Scan using uk_student_class_detail_aamcid_classlevelcd on
student_class_detail (cost=0.43..74747.61 rows=1284079 width=6) (actual
time=1.570..2723.014 rows=1272390 loops=3)
Filter: (class_level_start_dt IS NOT NULL)
Rows Removed by Filter: 160402
The filter is bound to be misestimated, and the error then snowballs.
Try replacing this part with a temporary table (with pre-aggregated
results) - you can run analyze on it, etc. I'd bet that'll make the
issue go away.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company