Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id 20250306044933.7a.nmisch@google.com
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote:
> On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:
> > [23:48:44.444](1.129s) ok 3 - reserved_connections limit
> > [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
> > process ended prematurely at
> > /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> > line 154.
> > # Postmaster PID for node "primary" is 198592
> 
> 
> I just saw this failure on skink in the BF:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23
> 
> [17:05:56.438](0.247s) ok 3 - reserved_connections limit
> [17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
> process ended prematurely at
/home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pmline 160.
 
> 
> 
> > That BackgroundPsql.pm line is this in wait_connect()
> > 
> >   $self->{run}->pump()
> >     until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
> 
> A big part of the problem here imo is the exception behaviour that
> IPC::Run::pump() has:
> 
>   If pump() is called after all harnessed activities have completed, a "process
>   ended prematurely" exception to be thrown.  This allows for simple scripting
>   of external applications without having to add lots of error handling code at
>   each step of the script:
> 
> Which is, uh, not very compatible with how we use IPC::Run (here and
> elsewhere).  Just ending the test because a connection failed is pretty awful.

Historically, I think we've avoided this sort of trouble by doing pipe I/O
only on processes where we feel able to predict when the process will exit.
Commit f44b9b6 is one example (simpler case, not involving pump()).  It would
be a nice improvement to do better, since there's always some risk of
unexpected exit.

> This behaviour makes it really hard to debug problems. It'd have been a lot
> easier to understand the problem if we'd seen psql's stderr before the test
> died.
> 
> I guess that mean at the very least we'd need to put an eval {} around the
> ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

Officially, you could call ->pumpable() before ->pump().  It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.'  I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding.  In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".

> Presumably not just in in wait_connect(), but also at least in pump_until()?

If the goal is to have it capture maximum data from processes that exit when
we don't expect it (seems good to me), yes.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add Pipelining support in psql
Next
From: Ashutosh Sharma
Date:
Subject: Re: Orphaned users in PG16 and above can only be managed by Superusers