Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id 4d964960-72fa-4741-8a7c-879b958eaa29@iki.fi
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Refactoring postmaster's code to cleanup after child exit
List pgsql-hackers
On 09/12/2024 22:55, Heikki Linnakangas wrote:
> Not sure how to fix this. A small sleep in the test would work, but in 
> principle there's no delay that's guaranteed to be enough. A more robust 
> solution would be to run a "select count(*) from pg_stat_activity" and 
> wait until the number of connections are what's expected. I'll try that 
> and see how complicated that gets..

Checking pg_stat_activity doesn't help, because the backend doesn't 
register itself in pg_stat_activity until later. A connection that's 
rejected due to connection limits never shows up in pg_stat_activity.

Some options:

0. Do nothing

1. Add a small sleep to the test

2. Move the pgstat_bestart() call earlier in the startup sequence, so 
that a backend shows up in pg_stat_activity before it acquires a PGPROC 
entry, and stays visible until after it has released its PGPROC entry. 
This would give more visibility to backends that are starting up.

3. Rearrange the FATAL error handling so that the process removes itself 
from PGPROC before sending the error to the client. That would be kind 
of nice anyway. Currently, if sending the rejection error message to the 
client blocks, you are holding up a PGPROC slot until the message is 
sent. The error message packet is short, so it's highly unlikely to 
block, but still.

Option 3 seems kind of nice in principle, but looking at the code, it's 
a bit awkward to implement.  Easiest way to implement it would be to 
modify send_message_to_frontend() to not call pq_flush() on FATAL 
errors, and flush the data in socket_close() instead. Not a lot of code, 
but it's a pretty ugly special case.

Option 2 seems nice too, but seems like a lot of work.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)
Next
From: Tomas Vondra
Date:
Subject: Re: Refactoring postmaster's code to cleanup after child exit