Hello Team,
There is change in query plan in 12.4 version and Version 13 resulting in performance slowness post upgrade.
- In 12.4 version, Sort Operation Group Aggregate is selected which results to Merge Join. Query takes ~5 seconds.
- In 13.5 version, optimizer wrongly estimates and due to new Disk Based Hash Aggregate feature, it prefers Hash Aggregate instead of Sort Operation which finally blocks merge-join and chooses Nested Loop Left Join. Query takes ~5 minutes.
NOTE: Disabling Hash Aggregate on instance level forces optimizer to choose merge operation but such instance level modification is not possible in terms of Application Functionality.
This performance issue is on all over most of queries. Attached one of the query and its plan in both version for reference in case that helps for recreating the issue.
Version 13 query plan has lower estimated cost than that of 12.4 which implies 13.5 planner thought it found a better plan, but it is running slower and actual cost show more.
12.4 Version:
"Merge Right Join (cost=202198.78..295729.10 rows=1 width=8) (actual time=1399.727..5224.574 rows=296 loops=1)"
13.5 version:-
"Nested Loop Left Join (cost=196360.90..287890.45 rows=1 width=8) (actual time=3209.577..371300.693 rows=296 loops=1)"
Details:-
- It is AWS Aurora-Postgresql RDS instance. We have raised case with AWS and since this issue is a regression coming from the community PostgreSQL code, we would like to raise bug here.
- We were upgrading from 12.4 version to (13.4 and later)
- vCPU: 2 , RAM: 8 GB
- Attached Stats for all tables in this schema for your reference.
- Attached is metadata for one of the table person for your reference.
We have performed many such below steps, but it did not help:-
- We have performed Vacuum/Analyze/Reindex post Upgrade.
- Tweaked work_mem so it does not spill to Disk. We can Disk Usage But it is still using Hash Aggregate and came down from 5 minutes to 20 seconds. (Expected ~5 seconds). Attached plan after modifying work_mem
- Disabled Seqcan/ nestedloop
- Tweaked random_page_cost/seq_page_cost
- Set default_statistics_target=1000 and then run vacuum(analyze,verbose) on selected tables.
- We have also tested performance by increasing resources up to 4 vCPU and 32 GB RAM.
Could you please check and confirm if this incorrect Cost Estimation is known concern in Version 13 where in some cases optimizer calculates and prefers Hash AggregateèNested Left Loop Join instead of Merge Join?
Thanks & Regards,
Prajna Shetty
Technical Specialist,
Data Platform Support & Delivery
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, March 21, 2022 7:29 PM
To: Prajna Shetty <Prajna.Shetty@mindtree.com>
Cc: pgsql-bugs@lists.postgresql.org; Beenu Sharma <Beenu.Sharma@mindtree.com>
Subject: Re: Performance issue post upgrade on Version 13 - Incorrect Estimation Cost choosing Hash Aggregate-Nested Left Loop Join
* This e-mail originated outside of Mindtree. Exercise caution before clicking links or opening attachments *
> There is change in query plan in 12.4 version and Version 13 resulting in performance slowness post upgrade.
Standard upgrade methods don't transfer statistics from the old version, so the first question to ask is have you ANALYZE'd the relevant tables since upgrading?
If you have, then to offer useful help with this we'll need to see all the details described in
In any case, this is unlikely to be a bug. The pgsql-performance list would be a more suitable place to discuss it.
regards, tom lane
________________________________