1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to sequence scan is more right.
2) no blockings or IO issue during the time.
3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.
4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)
This looks puzzle to us, just in case anyone experianced pls share your experience.
Regards,
Ravi
On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?
Point is that if the optimizer sees the same prepared query , 5 times, the it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the a custom plan will be generated anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem just worth eliminating it from the list of potential causes.
One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan.
size of the each partitions : 300GB Index Size : 12GB
Postgres Version : 16.x Shared Buffers : 75 GB Effective_cache : 175 GB Work _mem : 4MB Max_connections : 3000
OS : Ubuntu 22.04 Ram : 384 GB CPU : 64
Please let us know if you need any further information or if there are additional details required.