Thread: BUG #18141: sorry, too many clients error occurring very frequently
BUG #18141: sorry, too many clients error occurring very frequently
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18141 Logged by: Ravi Agrawal Email address: ravi.s.agrawal23@gmail.com PostgreSQL version: 13.11 Operating system: RHEL Description: Hi Team, We have updated PostgresSQL from version 13.4 to 13.11 and observing 'sorry, too many clients' error very frequently. Connection count is shooting up on performing basic operations and crossing the max_connections value. This has not been observed previously on Postgres v13.4 We need to understand if this issue is caused due to the change in version. Is there any history of known incidents facing this issue with the specific version of PostgreSQL. Appreciate your response on this. Many Thanks, Ravi Agrawal
BUG #18141: sorry, too many clients error occurring very frequently
From
"Wetmore, Matthew (CTR)"
Date:
I can second this behavior. We did a few things to mitigate the issue. HA may be at play here too. We're still monitoring. 13.11 + HA - these issues 13.6 , no HA - no issues -----Original Message----- From: PG Bug reporting form <noreply@postgresql.org> Sent: Friday, September 29, 2023 7:32 AM To: pgsql-bugs@lists.postgresql.org Cc: ravi.s.agrawal23@gmail.com Subject: [EXTERNAL] BUG #18141: sorry, too many clients error occurring very frequently The following bug has been logged on the website: Bug reference: 18141 Logged by: Ravi Agrawal Email address: ravi.s.agrawal23@gmail.com PostgreSQL version: 13.11 Operating system: RHEL Description: Hi Team, We have updated PostgresSQL from version 13.4 to 13.11 and observing 'sorry, too many clients' error very frequently. Connection count is shooting up on performing basic operations and crossing the max_connections value. This has not beenobserved previously on Postgres v13.4 We need to understand if this issue is caused due to the change in version. Is there any history of known incidents facing this issue with the specific version of PostgreSQL. Appreciate your response on this. Many Thanks, Ravi Agrawal
On 9/29/23 10:31, PG Bug reporting form wrote: > We have updated PostgresSQL from version 13.4 to 13.11 and observing 'sorry, > too many clients' error very frequently. > Connection count is shooting up on performing basic operations and crossing > the max_connections value. This has not been observed previously on Postgres > v13.4 > We need to understand if this issue is caused due to the change in version. > Is there any history of known incidents facing this issue with the specific > version of PostgreSQL. Question -- are you using pg_stat_statements? If so, please try disabling it and see if the issue goes away. Also was is your work_mem setting? Does raising it help? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Sep 29, 2023 at 7:55 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18141
Logged by: Ravi Agrawal
Email address: ravi.s.agrawal23@gmail.com
PostgreSQL version: 13.11
Operating system: RHEL
Description:
Hi Team,
We have updated PostgresSQL from version 13.4 to 13.11 and observing 'sorry,
too many clients' error very frequently.
Connection count is shooting up on performing basic operations and crossing
the max_connections value. This has not been observed previously on Postgres
v13.4
We need to understand if this issue is caused due to the change in version.
Is there any history of known incidents facing this issue with the specific
version of PostgreSQL.
I think it is unlikely to be a bug. How did you do the upgrade? Just install the new binaries, then restart the server?
Can you capture plans from the newly slow queries and compare them to plans from the same queries before the upgrade?
Cheers,
Jeff
On 9/30/23 14:04, Jeff Janes wrote: > On Fri, Sep 29, 2023 at 7:55 PM PG Bug reporting form > <noreply@postgresql.org <mailto:noreply@postgresql.org>> wrote: > > The following bug has been logged on the website: > > Bug reference: 18141 > Logged by: Ravi Agrawal > Email address: ravi.s.agrawal23@gmail.com > <mailto:ravi.s.agrawal23@gmail.com> > PostgreSQL version: 13.11 > Operating system: RHEL > Description: > > Hi Team, > > We have updated PostgresSQL from version 13.4 to 13.11 and observing > 'sorry, > too many clients' error very frequently. > Connection count is shooting up on performing basic operations and > crossing > the max_connections value. This has not been observed previously on > Postgres > v13.4 > We need to understand if this issue is caused due to the change in > version. > Is there any history of known incidents facing this issue with the > specific > version of PostgreSQL. > > > I think it is unlikely to be a bug. How did you do the upgrade? Just > install the new binaries, then restart the server? Maybe, maybe not. I have seen two other cases that are similar. One was an upgrade from 12.8 to 12.12 and the other an upgrade from 13.4 to 13.8. I checked and 12.8 was stamped is the same date as 13.4, and 12.12 the same day as 13.8. In both cases queries against pg_stat_statements suddenly started taking more memory leading to spillage to pg_temp and a step degradation in overall performance. In at least one of those cases the solution/workaround was to increase work_mem. In the other I think pg_stat_statements was disabled. Myself and at least one other hacker looked at the pg_stat_statements specific changes in that time interval and saw no smoking gun. But it is possible that something else backpatched to both branches between Aug 09, 2021 and Aug 8, 2022 has caused a more general performance regression which we have yet to track down. At least based on this sample of two (now maybe 3?) folks with similar symptoms. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes: > Maybe, maybe not. I have seen two other cases that are similar. One was > an upgrade from 12.8 to 12.12 and the other an upgrade from 13.4 to 13.8. > I checked and 12.8 was stamped is the same date as 13.4, and 12.12 the > same day as 13.8. > In both cases queries against pg_stat_statements suddenly started taking > more memory leading to spillage to pg_temp and a step degradation in > overall performance. In at least one of those cases the > solution/workaround was to increase work_mem. In the other I think > pg_stat_statements was disabled. > Myself and at least one other hacker looked at the pg_stat_statements > specific changes in that time interval and saw no smoking gun. > But it is possible that something else backpatched to both branches > between Aug 09, 2021 and Aug 8, 2022 has caused a more general > performance regression which we have yet to track down. Hmm. My first instinct is to wonder about changes in plan selection. How complex were the troublesome queries? regards, tom lane
On 10/1/23 14:09, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> Maybe, maybe not. I have seen two other cases that are similar. One was >> an upgrade from 12.8 to 12.12 and the other an upgrade from 13.4 to 13.8. > >> I checked and 12.8 was stamped is the same date as 13.4, and 12.12 the >> same day as 13.8. > >> In both cases queries against pg_stat_statements suddenly started taking >> more memory leading to spillage to pg_temp and a step degradation in >> overall performance. In at least one of those cases the >> solution/workaround was to increase work_mem. In the other I think >> pg_stat_statements was disabled. > >> Myself and at least one other hacker looked at the pg_stat_statements >> specific changes in that time interval and saw no smoking gun. > >> But it is possible that something else backpatched to both branches >> between Aug 09, 2021 and Aug 8, 2022 has caused a more general >> performance regression which we have yet to track down. > > Hmm. My first instinct is to wonder about changes in plan selection. > How complex were the troublesome queries? I think both of these cases involved a number of common attributes: * The queries against pg_stat_statements were relatively complex * The other queries on the system were relatively long and complex (and thus the query string length in pg_stat_statements) * Prior to the upgrade the systems were overall keeping up, but extremely busy In one case it seems that the upgrade caused a significant increase of temp file usage. This impacted the system enough that other active queries took longer, and thus number of active connections increased. Raising work_mem eliminated the temp file usage and cpu loads dropped back to similar levels as they were prior to the minor upgrade. The other case had different specifics, but generally involved increased memory usage. In that one eliminating the use of pg_stat_statements restored performance. They did not try raising work_mem (as I understand it), and I did not get any info regarding temp file spillage there. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes: > On 10/1/23 14:09, Tom Lane wrote: >> Joe Conway <mail@joeconway.com> writes: >>> But it is possible that something else backpatched to both branches >>> between Aug 09, 2021 and Aug 8, 2022 has caused a more general >>> performance regression which we have yet to track down. >> Hmm. My first instinct is to wonder about changes in plan selection. >> How complex were the troublesome queries? > I think both of these cases involved a number of common attributes: > * The queries against pg_stat_statements were > relatively complex > * The other queries on the system were relatively > long and complex (and thus the query string length > in pg_stat_statements) > * Prior to the upgrade the systems were overall > keeping up, but extremely busy > In one case it seems that the upgrade caused a significant increase of > temp file usage. This impacted the system enough that other active > queries took longer, and thus number of active connections increased. > Raising work_mem eliminated the temp file usage and cpu loads dropped > back to similar levels as they were prior to the minor upgrade. Interesting. After some desultory trawling through the commit log, I'm wondering if there could be some connection to Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: REL_13_STABLE Release: REL_13_5 [dede14399] 2021-09-20 11:48:52 -0400 Branch: REL_12_STABLE Release: REL_12_9 [f230614da] 2021-09-20 11:48:52 -0400 Branch: REL_11_STABLE Release: REL_11_14 [914e54501] 2021-09-20 11:48:52 -0400 Branch: REL_10_STABLE Release: REL_10_19 [923b7efc2] 2021-09-20 11:48:52 -0400 Branch: REL9_6_STABLE Release: REL9_6_24 [183b3aced] 2021-09-20 11:48:52 -0400 Don't elide casting to typmod -1. That's in the right timeframe for these reports. A fairly direct connection to pg_stat_statements could be made: addition of RelabelType nodes that weren't there before would change the query ID hash of affected statements, so that after running for a little while the pg_stat_statements hash would have a bunch of duplicate entries (assuming you didn't bump up against pg_stat_statements.max). As long as nothing happened to age out the now-dead entries with the old query hashes, you'd have more rows in the pg_stat_statements view than before, which could explain performance decreases in queries on that view. The big hole in this theory is that I would not have expected this casting change to affect any large proportion of SQL commands, so that it's a bit hard to credit it causing a lot of bloat in pg_stat_statements. Still, some specific coding habit or DDL detail could maybe allow that to happen in a particular application. Another line of thought is that the extra RelabelType nodes could block a planner optimization that used to occur before. It's not apparent why that would manifest specifically in connection with pg_stat_statements queries though. If that is the problem, we'd need an example of a query whose plan changed in order to pin down the cause. Anyway, that's just a theory, and it might be hot air. regards, tom lane