Thread: Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

From
David Mullineux
Date:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to  sequence scan is more right.

2) no blockings or IO issue during the time.

3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.

4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)

This looks puzzle to us, just in case anyone experianced pls share your experience.

Regards,
Ravi

On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

From
David Mullineux
Date:
Just spotted a potential problem. The indexed column is a bigint. Are you, in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <ravisql09@gmail.com> wrote:
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to  sequence scan is more right.

2) no blockings or IO issue during the time.

3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.

4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)

This looks puzzle to us, just in case anyone experianced pls share your experience.

Regards,
Ravi

On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.
Sorry, it was typo. Bind variable is bigint only.

Thanks 

On Fri, 8 Nov, 2024, 7:09 pm David Mullineux, <dmullx@gmail.com> wrote:
Just spotted a potential problem. The indexed column is a bigint. Are you, in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <ravisql09@gmail.com> wrote:
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to  sequence scan is more right.

2) no blockings or IO issue during the time.

3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.

4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)

This looks puzzle to us, just in case anyone experianced pls share your experience.

Regards,
Ravi

On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

From
David Mullineux
Date:
Thanks for correction. At this point I would be trying to modify 
plan_cache_mode
for the session which uses the bond variable. alter it so that plan_cache_mode=force_custom_plan
One hypothesis is that, a bad plan got cached for that SQL pattern. Obviously, when you run it manually you are always getting a custom plan as it's not a prepared statement.  




On Sat, 9 Nov 2024, 03:46 ravi k, <ravisql09@gmail.com> wrote:
Sorry, it was typo. Bind variable is bigint only.

Thanks 

On Fri, 8 Nov, 2024, 7:09 pm David Mullineux, <dmullx@gmail.com> wrote:
Just spotted a potential problem. The indexed column is a bigint. Are you, in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <ravisql09@gmail.com> wrote:
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to  sequence scan is more right.

2) no blockings or IO issue during the time.

3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.

4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)

This looks puzzle to us, just in case anyone experianced pls share your experience.

Regards,
Ravi

On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.
Thanks for the advice!

I am planing to set session level!

but before that one more observations noticed i.e One more table has same issue, which is having similar like hash partitions.

And I scheduled manual analyze for all parent hash tables(thus all stats will update together).

After this change I didn't noticed the issue, not sure does this addressed issue or not, just monitoring if this not works will set custom plan in session level.

I have seen in SQL server  parameter sniffing regularly but in postgres I never  experienced. I am still wondering does this sniffing or not as from stats I didn't notice any sequence scan.

Best,







On Sat, 9 Nov, 2024, 3:40 pm David Mullineux, <dmullx@gmail.com> wrote:
Thanks for correction. At this point I would be trying to modify 
plan_cache_mode
for the session which uses the bond variable. alter it so that plan_cache_mode=force_custom_plan
One hypothesis is that, a bad plan got cached for that SQL pattern. Obviously, when you run it manually you are always getting a custom plan as it's not a prepared statement.  




On Sat, 9 Nov 2024, 03:46 ravi k, <ravisql09@gmail.com> wrote:
Sorry, it was typo. Bind variable is bigint only.

Thanks 

On Fri, 8 Nov, 2024, 7:09 pm David Mullineux, <dmullx@gmail.com> wrote:
Just spotted a potential problem. The indexed column is a bigint. Are you, in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <ravisql09@gmail.com> wrote:
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_user_tables ( hope stats are accurate in postgres 16) if parameter sniffing happens the possibility of going to  sequence scan is more right.

2) no blockings or IO issue during the time.

3) even with limit clause if touch all partitions also it could have been completed in milliseconds as this is just one record.

4) auto_explain in prod we cannot enable as this is expensive and with high TPS we may face latency issues and lower environment this issue cannot be reproduced,( this is happening out of Million one case)

This looks puzzle to us, just in case anyone experianced pls share your experience.

Regards,
Ravi

On Thu, 7 Nov, 2024, 3:41 am David Mullineux, <dmullx@gmail.com> wrote:
It might be worth eliminating the use of cached plans here. Is your app using prepared statements at all?  
Point is that if the optimizer sees the same prepared query , 5 times, the  it locks the plan that it found at that time. This is a good trade off as it avoids costly planning-time for repetitive queries. But if you are manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any cached plans.
This may not be your problem  just worth eliminating it from the list of potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
Hi Team,

One of the queries, which retrieves a single record from a table with 16 hash partitions, is taking more than 10 seconds to execute. In contrast, when we run the same query manually, it completes within milliseconds. This issue is causing exhaustion of the application pools. Do we have any bugs in postgrs16 hash partitions? Please find the attached log, table, and execution plan. 

size of the each partitions : 300GB 
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are additional details required.  


Regards,
Ram.