Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Noah Misch <noah@leadboat.com>)
Responses Re: Refactoring postmaster's code to cleanup after child exit
List pgsql-hackers
Hi,

On 2025-03-05 20:49:33 -0800, Noah Misch wrote:
> > This behaviour makes it really hard to debug problems. It'd have been a lot
> > easier to understand the problem if we'd seen psql's stderr before the test
> > died.
> > 
> > I guess that mean at the very least we'd need to put an eval {} around the
> > ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?
> 
> That sounds right.

In the attached patch I did that for wait_connect().  I did verify that it
works by implementing the wait_connect() fix before fixing
002_connection_limits.pl, which fails if a sleep(1) is added just before the
proc_exit(1) for FATAL.


I didn't yet tackle pump_until() yet as it

a) uses pumpable() to check if it's safe to pump() and should kinda sometimes
   maybe report an error, even though the fact that it doesn't display stderr
   (if stout is waited on) makes it harder to debug.

b) Fixing the error report seems like it'd require an interface change to
   pump_until().


> Officially, you could call ->pumpable() before ->pump().  It's defined as
> 'Returns TRUE if calling pump() won't throw an immediate "process ended
> prematurely" exception.'

It's also documented to be internal only...

I do share your doubts re pumpable():

> I lack high confidence that it avoids the exception,
> because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
> decide "process ended prematurely" based on the new finding.  In other words,
> I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".


On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:
> > For this test, could we perhaps rely on the log messages postmaster logs when
> > child processes exit?
> > 
> > 2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG:  connection received: host=[local]
> > 2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL:  sorry, too many clients already
> > 2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG:  releasing pm child slot 2
> > 2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG:  client backend (PID 3509838) exited with exit code
1
> > 
> > I.e. the test could wait for the 'client backend exited' message using
> > ->wait_for_log()?
> 
> Matching expected contents in the server logs is a practice I've found
> to be rather reliable, with wait_for_log().

The attached patch implements that approach. It does fix the problem from what
I can tell. It's not great that it requires log_min_messages = DEBUG2, but
that seems ok for this test.


> Why not adding an injection point with a WARNING or a LOG generated, then
> check the server logs for the code path taken based on the elog() generated
> with the point name?

I think the log_min_messages approach is a lot simpler. If we need something
like this more widely we can reconsider injection points...


I also attached a patch to improve connect_fails()/connect_ok() test names a
bit. They weren't symmetric and I felt they were lacking in detail for the
psql return code check.


Another annoying and also funny problem I saw is this failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-06%2009%3A18%3A21
2025-03-06 10:42:02.552 UTC [372451][postmaster][:0] LOG:  1800 s is outside the valid range for parameter
"authentication_timeout"(1 s .. 600 s)
 

I had to increase PG_TEST_TIMEOUT_DEFAULT due to some other test timing out
when run under valgrind (due to having to insert a lot of rows). But then this
test runs into the above issue.

The easiest way seems to be to just limit PG_TEST_TIMEOUT_DEFAULT in this
test.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Statistics Import and Export
Next
From: Heikki Linnakangas
Date:
Subject: Re: Refactoring postmaster's code to cleanup after child exit