On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote:
> On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:
> > [23:48:44.444](1.129s) ok 3 - reserved_connections limit
> > [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
> > process ended prematurely at
> > /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> > line 154.
> > # Postmaster PID for node "primary" is 198592
>
>
> I just saw this failure on skink in the BF:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23
>
> [17:05:56.438](0.247s) ok 3 - reserved_connections limit
> [17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
> process ended prematurely at
/home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pmline 160.
>
>
> > That BackgroundPsql.pm line is this in wait_connect()
> >
> > $self->{run}->pump()
> > until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
>
> A big part of the problem here imo is the exception behaviour that
> IPC::Run::pump() has:
>
> If pump() is called after all harnessed activities have completed, a "process
> ended prematurely" exception to be thrown. This allows for simple scripting
> of external applications without having to add lots of error handling code at
> each step of the script:
>
> Which is, uh, not very compatible with how we use IPC::Run (here and
> elsewhere). Just ending the test because a connection failed is pretty awful.
Historically, I think we've avoided this sort of trouble by doing pipe I/O
only on processes where we feel able to predict when the process will exit.
Commit f44b9b6 is one example (simpler case, not involving pump()). It would
be a nice improvement to do better, since there's always some risk of
unexpected exit.
> This behaviour makes it really hard to debug problems. It'd have been a lot
> easier to understand the problem if we'd seen psql's stderr before the test
> died.
>
> I guess that mean at the very least we'd need to put an eval {} around the
> ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?
That sounds right.
Officially, you could call ->pumpable() before ->pump(). It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.' I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding. In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".
> Presumably not just in in wait_connect(), but also at least in pump_until()?
If the goal is to have it capture maximum data from processes that exit when
we don't expect it (seems good to me), yes.