Re: BUG #18141: sorry, too many clients error occurring very frequently - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #18141: sorry, too many clients error occurring very frequently
Date
Msg-id 1860466.1696189555@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #18141: sorry, too many clients error occurring very frequently  (Joe Conway <mail@joeconway.com>)
List pgsql-bugs
Joe Conway <mail@joeconway.com> writes:
> On 10/1/23 14:09, Tom Lane wrote:
>> Joe Conway <mail@joeconway.com> writes:
>>> But it is possible that something else backpatched to both branches 
>>> between Aug 09, 2021 and Aug 8, 2022 has caused a more general 
>>> performance regression which we have yet to track down.

>> Hmm.  My first instinct is to wonder about changes in plan selection.
>> How complex were the troublesome queries?

> I think both of these cases involved a number of common attributes:
> * The queries against pg_stat_statements were
>    relatively complex
> * The other queries on the system were relatively
>    long and complex (and thus the query string length
>    in pg_stat_statements)
> * Prior to the upgrade the systems were overall
>    keeping up, but extremely busy
> In one case it seems that the upgrade caused a significant increase of 
> temp file usage. This impacted the system enough that other active 
> queries took longer, and thus number of active connections increased. 
> Raising work_mem eliminated the temp file usage and cpu loads dropped 
> back to similar levels as they were prior to the minor upgrade.

Interesting.  After some desultory trawling through the commit log,
I'm wondering if there could be some connection to

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: REL_13_STABLE Release: REL_13_5 [dede14399] 2021-09-20 11:48:52 -0400
Branch: REL_12_STABLE Release: REL_12_9 [f230614da] 2021-09-20 11:48:52 -0400
Branch: REL_11_STABLE Release: REL_11_14 [914e54501] 2021-09-20 11:48:52 -0400
Branch: REL_10_STABLE Release: REL_10_19 [923b7efc2] 2021-09-20 11:48:52 -0400
Branch: REL9_6_STABLE Release: REL9_6_24 [183b3aced] 2021-09-20 11:48:52 -0400

    Don't elide casting to typmod -1.

That's in the right timeframe for these reports.  A fairly direct
connection to pg_stat_statements could be made: addition of
RelabelType nodes that weren't there before would change the query ID
hash of affected statements, so that after running for a little while
the pg_stat_statements hash would have a bunch of duplicate entries
(assuming you didn't bump up against pg_stat_statements.max).
As long as nothing happened to age out the now-dead entries with
the old query hashes, you'd have more rows in the pg_stat_statements
view than before, which could explain performance decreases in
queries on that view.

The big hole in this theory is that I would not have expected this
casting change to affect any large proportion of SQL commands,
so that it's a bit hard to credit it causing a lot of bloat in
pg_stat_statements.  Still, some specific coding habit or DDL
detail could maybe allow that to happen in a particular application.

Another line of thought is that the extra RelabelType nodes could
block a planner optimization that used to occur before.  It's not
apparent why that would manifest specifically in connection with
pg_stat_statements queries though.  If that is the problem,
we'd need an example of a query whose plan changed in order to pin
down the cause.

Anyway, that's just a theory, and it might be hot air.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Joe Conway
Date:
Subject: Re: BUG #18141: sorry, too many clients error occurring very frequently
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18134: ROW_COUNT do not set to 0 when psql's \gset command get no rows returned