Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id sllzoxbck2wlnbqcqa36i7fge4l6rkiwy4lp56tmn556b4phhy@45ogfqbsd32i
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Hi,

On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:
> Hi, the TAP test 001_connection_limits.pl introduced by 6a1d0d470e84
> seems to have problems with valgrind :-( I reliably get this failure:
> 
> 
> t/001_connection_limits.pl .. 3/? # Tests were run but no plan was
> declared and done_testing() was not seen.
> # Looks like your test exited with 29 just after 4.
> t/001_connection_limits.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 4 subtests passed
> 
> 
> and tmp_check/log/regress_log_001_connection_limits says:
> 
> 
> [23:48:44.444](1.129s) ok 3 - reserved_connections limit
> [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
> process ended prematurely at
> /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> line 154.
> # Postmaster PID for node "primary" is 198592


I just saw this failure on skink in the BF:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23

[17:05:56.438](0.247s) ok 3 - reserved_connections limit
[17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
process ended prematurely at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line160.
 


> That BackgroundPsql.pm line is this in wait_connect()
> 
>   $self->{run}->pump()
>     until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;

A big part of the problem here imo is the exception behaviour that
IPC::Run::pump() has:

  If pump() is called after all harnessed activities have completed, a "process
  ended prematurely" exception to be thrown.  This allows for simple scripting
  of external applications without having to add lots of error handling code at
  each step of the script:

Which is, uh, not very compatible with how we use IPC::Run (here and
elsewhere).  Just ending the test because a connection failed is pretty awful.


This behaviour makes it really hard to debug problems. It'd have been a lot
easier to understand the problem if we'd seen psql's stderr before the test
died.

I guess that mean at the very least we'd need to put an eval {} around the
->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?
Presumably not just in in wait_connect(), but also at least in pump_until()?



Will respond downthread to a potential workaround for the issue.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Next
From: Andrew Dunstan
Date:
Subject: Re: scalability bottlenecks with (many) partitions (and more)