Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id gdojfusnbe3ae47n6qjezclpv4462xbdc2ssoadtyklgdw5dqb@b4r5yw3fcifm
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Refactoring postmaster's code to cleanup after child exit
List pgsql-hackers
Hi,

On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:
> On 09/12/2024 22:55, Heikki Linnakangas wrote:
> > Not sure how to fix this. A small sleep in the test would work, but in
> > principle there's no delay that's guaranteed to be enough. A more robust
> > solution would be to run a "select count(*) from pg_stat_activity" and
> > wait until the number of connections are what's expected. I'll try that
> > and see how complicated that gets..
> 
> Checking pg_stat_activity doesn't help, because the backend doesn't register
> itself in pg_stat_activity until later. A connection that's rejected due to
> connection limits never shows up in pg_stat_activity.
> 
> Some options:
> 
> 0. Do nothing
> 
> 1. Add a small sleep to the test
> 
> 2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
> backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
> stays visible until after it has released its PGPROC entry. This would give
> more visibility to backends that are starting up.

We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?



> 3. Rearrange the FATAL error handling so that the process removes itself
> from PGPROC before sending the error to the client. That would be kind of
> nice anyway. Currently, if sending the rejection error message to the client
> blocks, you are holding up a PGPROC slot until the message is sent. The
> error message packet is short, so it's highly unlikely to block, but still.

This is definitely a problem, there was even a recent thread about it. It can
be triggered even with just an ERROR message though :(


For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?

2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG:  connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL:  sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG:  releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG:  client backend (PID 3509838) exited with exit code 1

I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Next
From: Daniel Gustafsson
Date:
Subject: Re: Next commitfest app release is planned for March 18th