Hi,
On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:
> On 09/12/2024 22:55, Heikki Linnakangas wrote:
> > Not sure how to fix this. A small sleep in the test would work, but in
> > principle there's no delay that's guaranteed to be enough. A more robust
> > solution would be to run a "select count(*) from pg_stat_activity" and
> > wait until the number of connections are what's expected. I'll try that
> > and see how complicated that gets..
>
> Checking pg_stat_activity doesn't help, because the backend doesn't register
> itself in pg_stat_activity until later. A connection that's rejected due to
> connection limits never shows up in pg_stat_activity.
>
> Some options:
>
> 0. Do nothing
>
> 1. Add a small sleep to the test
>
> 2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
> backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
> stays visible until after it has released its PGPROC entry. This would give
> more visibility to backends that are starting up.
We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?
> 3. Rearrange the FATAL error handling so that the process removes itself
> from PGPROC before sending the error to the client. That would be kind of
> nice anyway. Currently, if sending the rejection error message to the client
> blocks, you are holding up a PGPROC slot until the message is sent. The
> error message packet is short, so it's highly unlikely to block, but still.
This is definitely a problem, there was even a recent thread about it. It can
be triggered even with just an ERROR message though :(
For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?
2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1
I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?
Greetings,
Andres Freund