Thread: ssl tests aren't concurrency safe due to get_free_port()

ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
Hi,

See e.g. https://cirrus-ci.com/task/4682373060100096
2022-10-01 15:15:21.849 UTC [41962][postmaster] LOG:  could not bind IPv4 address "127.0.0.1": Address already in use
2022-10-01 15:15:21.849 UTC [41962][postmaster] HINT:  Is another postmaster already running on port 57003? If not,
waita few seconds and retry.
 
2022-10-01 15:15:21.849 UTC [41962][postmaster] WARNING:  could not create listen socket for "127.0.0.1"

I downloaded all test logs and grepping for 57003 shows the problem:

build/testrun/ldap/001_auth/log/regress_log_001_auth
3:# Checking port 57003
4:# Found port 57003
22:# Running: /usr/sbin/slapd -f /tmp/cirrus-ci-build/build/testrun/ldap/001_auth/data/slapd.conf -h
ldap://localhost:57002ldaps://localhost:57003
 

build/testrun/ldap/001_auth/log/001_auth_node.log
253:2022-10-01 15:15:25.103 UTC [42574][client backend] [[unknown]][3/1:0] DETAIL:  Connection matched pg_hba.conf line
1:"local all all ldap ldapurl="ldaps://localhost:57003/dc=example,dc=net??sub?(uid=$username)" ldaptls=1"
 

build/testrun/ssl/001_ssltests/log/regress_log_001_ssltests
2:# Checking port 57003
3:# Found port 57003
8:Connection string: port=57003 host=/tmp/1k5yhaWLQ1

build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log
2:2022-10-01 15:15:20.668 UTC [41740][postmaster] LOG:  listening on Unix socket "/tmp/1k5yhaWLQ1/.s.PGSQL.57003"
58:2022-10-01 15:15:21.849 UTC [41962][postmaster] HINT:  Is another postmaster already running on port 57003? If not,
waita few seconds and retry.
 


I.e. we chose the same port for slapd as part of ldap's 001_auth.pl as for the
postgres instance of ssl's 001_ssltests.pl.


I don't think get_free_port() has any chance of being reliable as is. It's
fundamentally racy just among concurrently running tests, without even
considering things external to the tests (given it's using the range of ports
auto-assigned for client tcp ports...).


The current code is from 803466b6ffa, which said:
    This isn't 100% bulletproof, since
    conceivably something else on the machine could grab the port between
    the time we check and the time we actually start the server.  But that's
    a pretty short window, so in practice this should be good enough.

but I've seen this fail a couple times, so I suspect it's unfortunately not
good enough.


I can see a few potential ways of improving the situation:

1) Improve the port we start the search for a unique port from. We use the
   following bit right now:

    # Tracking of last port value assigned to accelerate free port lookup.
    $last_port_assigned = int(rand() * 16384) + 49152;

   It *might* be less likely that we hit conflicts if we start the search on a
   port based on the pid, rather than rand(). We e.g., could start at
   something like (($$ * 16) % 16384 + 49152), giving a decent likelihood that
   each test has 16 free ports.

   Perhaps also worth to increase the range of ports searched?


2) Use a lockfile containing a pid to protect the choice of a port within a
   build directory. Before accepting a port get_free_port() would check if the
   a lockfile exists for the port and if so, if the test using it is still
   alive.  That will protect against racyness between multiple tests inside a
   build directory, but won't protect against races between multiple builds
   running concurrently on a machine (e.g. on a buildfarm host)


3) We could generate unique port ranges for each test and thus remove the
   chance of conflicts within a builddir and substantially reduce the
   likelihood of conflicts between build directories (as the conflict would be
   between two tests in different build directories, rather than all tests).

   This would be easy to do in the meson build, but nontrivial in autoconf
   afaics.


4) Add retries to the tests that need get_free_port(). That seems hard to get
   right, because every single restart of postgres (or some other service that
   we used get_free_port()) provides the potential for a new conflict.


Greetings,

Andres Freund



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-10-02 Su 12:49, Andres Freund wrote:
>
> 2) Use a lockfile containing a pid to protect the choice of a port within a
>    build directory. Before accepting a port get_free_port() would check if the
>    a lockfile exists for the port and if so, if the test using it is still
>    alive.  That will protect against racyness between multiple tests inside a
>    build directory, but won't protect against races between multiple builds
>    running concurrently on a machine (e.g. on a buildfarm host)
>
>

I think this is the right solution. To deal with the last issue, the
lockdir should be overrideable, like this:


  my $port_lockdir = $ENV{PG_PORT_LOCKDIR} || $build_dir;


Buildfarm animals could set this, probably to the global lockdir (see
run_branches.pl). Prior to that, buildfarm owners could do that manually.


There are numerous examples of lockfile code in the buildfarm sources.
I'll try to hack something up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-10-04 Tu 01:39, Andrew Dunstan wrote:
> On 2022-10-02 Su 12:49, Andres Freund wrote:
>> 2) Use a lockfile containing a pid to protect the choice of a port within a
>>    build directory. Before accepting a port get_free_port() would check if the
>>    a lockfile exists for the port and if so, if the test using it is still
>>    alive.  That will protect against racyness between multiple tests inside a
>>    build directory, but won't protect against races between multiple builds
>>    running concurrently on a machine (e.g. on a buildfarm host)
>>
>>
> I think this is the right solution. To deal with the last issue, the
> lockdir should be overrideable, like this:
>
>
>   my $port_lockdir = $ENV{PG_PORT_LOCKDIR} || $build_dir;
>
>
> Buildfarm animals could set this, probably to the global lockdir (see
> run_branches.pl). Prior to that, buildfarm owners could do that manually.
>
>

The problem here is that Cluster.pm doesn't have any idea where the
build directory is, or even if there is one present at all.

meson does appear to let us know that, however, with MESON_BUILD_ROOT,
so probably the best thing would be to use PG_PORT_LOCKDIR if it's set,
otherwise MESON_BUILD_ROOT if it's set, otherwise the tmp_check
directory. If we want to backport to the make system we could export
top_builddir somewhere.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-10-17 Mo 10:59, Andrew Dunstan wrote:
> On 2022-10-04 Tu 01:39, Andrew Dunstan wrote:
>> On 2022-10-02 Su 12:49, Andres Freund wrote:
>>> 2) Use a lockfile containing a pid to protect the choice of a port within a
>>>    build directory. Before accepting a port get_free_port() would check if the
>>>    a lockfile exists for the port and if so, if the test using it is still
>>>    alive.  That will protect against racyness between multiple tests inside a
>>>    build directory, but won't protect against races between multiple builds
>>>    running concurrently on a machine (e.g. on a buildfarm host)
>>>
>>>
>> I think this is the right solution. To deal with the last issue, the
>> lockdir should be overrideable, like this:
>>
>>
>>   my $port_lockdir = $ENV{PG_PORT_LOCKDIR} || $build_dir;
>>
>>
>> Buildfarm animals could set this, probably to the global lockdir (see
>> run_branches.pl). Prior to that, buildfarm owners could do that manually.
>>
>>
> The problem here is that Cluster.pm doesn't have any idea where the
> build directory is, or even if there is one present at all.
>
> meson does appear to let us know that, however, with MESON_BUILD_ROOT,
> so probably the best thing would be to use PG_PORT_LOCKDIR if it's set,
> otherwise MESON_BUILD_ROOT if it's set, otherwise the tmp_check
> directory. If we want to backport to the make system we could export
> top_builddir somewhere.
>
>

Here's a patch which I think does the right thing.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-02 We 15:09, Andrew Dunstan wrote:
> On 2022-10-17 Mo 10:59, Andrew Dunstan wrote:
>> On 2022-10-04 Tu 01:39, Andrew Dunstan wrote:
>>> On 2022-10-02 Su 12:49, Andres Freund wrote:
>>>> 2) Use a lockfile containing a pid to protect the choice of a port within a
>>>>    build directory. Before accepting a port get_free_port() would check if the
>>>>    a lockfile exists for the port and if so, if the test using it is still
>>>>    alive.  That will protect against racyness between multiple tests inside a
>>>>    build directory, but won't protect against races between multiple builds
>>>>    running concurrently on a machine (e.g. on a buildfarm host)
>>>>
>>>>
>>> I think this is the right solution. To deal with the last issue, the
>>> lockdir should be overrideable, like this:
>>>
>>>
>>>   my $port_lockdir = $ENV{PG_PORT_LOCKDIR} || $build_dir;
>>>
>>>
>>> Buildfarm animals could set this, probably to the global lockdir (see
>>> run_branches.pl). Prior to that, buildfarm owners could do that manually.
>>>
>>>
>> The problem here is that Cluster.pm doesn't have any idea where the
>> build directory is, or even if there is one present at all.
>>
>> meson does appear to let us know that, however, with MESON_BUILD_ROOT,
>> so probably the best thing would be to use PG_PORT_LOCKDIR if it's set,
>> otherwise MESON_BUILD_ROOT if it's set, otherwise the tmp_check
>> directory. If we want to backport to the make system we could export
>> top_builddir somewhere.
>>
>>
> Here's a patch which I think does the right thing.
>
>
>

Updated with a couple of thinkos fixed.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
Hi,

On 2022-11-03 14:16:51 -0400, Andrew Dunstan wrote:
> > Here's a patch which I think does the right thing.
> Updated with a couple of thinkos fixed.

Thanks!


> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index d80134b26f..aceca353d3 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -93,9 +93,9 @@ use warnings;
>  
>  use Carp;
>  use Config;
> -use Fcntl qw(:mode);
> +use Fcntl qw(:mode :flock :seek O_CREAT O_RDWR);

Does this do anything useful on windows?



>  # the minimum version we believe to be compatible with this package without
>  # subclassing.
> @@ -140,6 +140,27 @@ INIT
>  
>      # Tracking of last port value assigned to accelerate free port lookup.
>      $last_port_assigned = int(rand() * 16384) + 49152;
> +
> +    # Set the port lock directory
> +
> +    # If we're told to use a directory (e.g. from a buildfarm client)
> +    # explicitly, use that
> +    $portdir = $ENV{PG_TEST_PORT_DIR};
> +    # Otherwise, try to use a directory at the top of the build tree
> +    if (! $portdir && $ENV{MESON_BUILD_ROOT})
> +    {
> +        $portdir = $ENV{MESON_BUILD_ROOT} . '/portlock'
> +    }
> +    elsif (! $portdir && ($ENV{TESTDATADIR} || "") =~ /\W(src|contrib)\W/p)
> +    {
> +        my $dir = ${^PREMATCH};
> +        $portdir = "$dir/portlock" if $dir;
> +    }
> +    # As a last resort use a directory under tmp_check
> +    $portdir ||= $PostgreSQL::Test::Utils::tmp_check . '/portlock';
> +    $portdir =~ s!\\!/!g;
> +    # Make sure the directory exists
> +    mkpath($portdir) unless -d $portdir;
>  }
>  
>  =pod
> @@ -1505,6 +1526,7 @@ sub get_free_port
>                      last;
>                  }
>              }
> +            $found = _reserve_port($port) if $found;
>          }
>      }
>  
> @@ -1535,6 +1557,38 @@ sub can_bind
>      return $ret;
>  }
>  
> +# Internal routine to reserve a port number
> +# Returns 1 if successful, 0 if port is already reserved.
> +sub _reserve_port
> +{
> +    my $port = shift;
> +    # open in rw mode so we don't have to reopen it and lose the lock
> +    sysopen(my $portfile, "$portdir/$port.rsv", O_RDWR|O_CREAT)
> +      || die "opening port file";
> +    # take an exclusive lock to avoid concurrent access
> +    flock($portfile, LOCK_EX) || die "locking port file";
> +    # see if someone else has or had a reservation of this port
> +    my $pid = <$portfile>;
> +    chomp $pid;
> +    if ($pid +0 > 0)

Gotta love perl.


> +    {
> +        if (kill 0, $pid)

Does this work on windows?


Greetings,

Andres Freund



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-05 Sa 14:36, Andres Freund wrote:
>>  
>>  use Carp;
>>  use Config;
>> -use Fcntl qw(:mode);
>> +use Fcntl qw(:mode :flock :seek O_CREAT O_RDWR);
> Does this do anything useful on windows?


All we're doing here on Windows and elsewhere is getting access to some
constants used in calls to flock(), seek() and sysopen(). It's not
actually doing anything else anywhere.


>
>> +    if ($pid +0 > 0)
> Gotta love perl.


Think of it as a typecast.


>
>
>> +    {
>> +        if (kill 0, $pid)
> Does this work on windows?
>
Yes, it's supposed to. It doesn't actually send a signal, it checks if
the process exists. There's some suggestion it might give false
positives on Windows, but that won't really hurt us here, we'll just
look for a different port.

One possible addition would be to add removing the reservation files in
an END handler. That would be pretty simple.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-06 Su 11:30, Andrew Dunstan wrote:
>
> One possible addition would be to add removing the reservation files in
> an END handler. That would be pretty simple.
>
>


Here's a version with that. I suggest we try it out and see if anything
breaks.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
Hi,

On 2022-11-15 15:56:37 -0500, Andrew Dunstan wrote:
> On 2022-11-06 Su 11:30, Andrew Dunstan wrote:
> >
> > One possible addition would be to add removing the reservation files in
> > an END handler. That would be pretty simple.
> >
> >
> 
> 
> Here's a version with that. I suggest we try it out and see if anything
> breaks.

Thanks! I agree it makes sense to go ahead with this.

I'd guess we should test drive this a bit in HEAD but eventually backpatch?


> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index d80134b26f..85fae32c14 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm

I think this should also update the comment for get_free_port(), since the
race mentioned there is largely addressed by this patch?


> @@ -140,6 +143,27 @@ INIT
>  
>      # Tracking of last port value assigned to accelerate free port lookup.
>      $last_port_assigned = int(rand() * 16384) + 49152;
> +
> +    # Set the port lock directory
> +
> +    # If we're told to use a directory (e.g. from a buildfarm client)
> +    # explicitly, use that
> +    $portdir = $ENV{PG_TEST_PORT_DIR};
> +    # Otherwise, try to use a directory at the top of the build tree
> +    if (! $portdir && $ENV{MESON_BUILD_ROOT})
> +    {
> +        $portdir = $ENV{MESON_BUILD_ROOT} . '/portlock'
> +    }
> +    elsif (! $portdir && ($ENV{TESTDATADIR} || "") =~ /\W(src|contrib)\W/p)
> +    {
> +        my $dir = ${^PREMATCH};
> +        $portdir = "$dir/portlock" if $dir;
> +    }
> +    # As a last resort use a directory under tmp_check
> +    $portdir ||= $PostgreSQL::Test::Utils::tmp_check . '/portlock';
> +    $portdir =~ s!\\!/!g;
> +    # Make sure the directory exists
> +    mkpath($portdir) unless -d $portdir;
>  }

Perhaps we should just export a directory in configure instead of this
guessing game?


>  =pod
> @@ -1505,6 +1529,7 @@ sub get_free_port
>                      last;
>                  }
>              }
> +            $found = _reserve_port($port) if $found;
>          }
>      }
>  
> @@ -1535,6 +1560,40 @@ sub can_bind
>      return $ret;
>  }
>  
> +# Internal routine to reserve a port number
> +# Returns 1 if successful, 0 if port is already reserved.
> +sub _reserve_port
> +{
> +    my $port = shift;
> +    # open in rw mode so we don't have to reopen it and lose the lock
> +    my $filename = "$portdir/$port.rsv";
> +    sysopen(my $portfile, $filename, O_RDWR|O_CREAT)
> +      || die "opening port file $filename";

Perhaps add $! to the message so e.g. permission denied errors or such are
easier to debug?


> +    # take an exclusive lock to avoid concurrent access
> +    flock($portfile, LOCK_EX) || die "locking port file $filename";

dito


> +    # see if someone else has or had a reservation of this port
> +    my $pid = <$portfile>;
> +    chomp $pid;
> +    if ($pid +0 > 0)
> +    {
> +        if (kill 0, $pid)
> +        {
> +            # process exists and is owned by us, so we can't reserve this port
> +            close($portfile);
> +            return 0;
> +        }
> +    }
> +    # All good, go ahead and reserve the port, first rewind and truncate.
> +    # If truncation fails it's not a tragedy, it just might leave some
> +    # trailing junk in the file that won't affect us.
> +    seek($portfile, 0, SEEK_SET);
> +    truncate($portfile, 0);

Perhaps check truncate's return value?


> +    print $portfile "$$\n";
> +    close($portfile);
> +    push(@port_reservation_files, $filename);
> +    return 1;
> +}

Perhaps it'd be better to release the file lock explicitly? flock() has this
annoying behaviour of only releasing the lock when the last file descriptor
for a file is closed. We shouldn't end up with dup'd FDs or forks here, but it
seems like it might be more robust to just explicitly release the lock?

Greetings,

Andres Freund



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-15 Tu 20:51, Andres Freund wrote:
>> @@ -140,6 +143,27 @@ INIT
>>  
>>      # Tracking of last port value assigned to accelerate free port lookup.
>>      $last_port_assigned = int(rand() * 16384) + 49152;
>> +
>> +    # Set the port lock directory
>> +
>> +    # If we're told to use a directory (e.g. from a buildfarm client)
>> +    # explicitly, use that
>> +    $portdir = $ENV{PG_TEST_PORT_DIR};
>> +    # Otherwise, try to use a directory at the top of the build tree
>> +    if (! $portdir && $ENV{MESON_BUILD_ROOT})
>> +    {
>> +        $portdir = $ENV{MESON_BUILD_ROOT} . '/portlock'
>> +    }
>> +    elsif (! $portdir && ($ENV{TESTDATADIR} || "") =~ /\W(src|contrib)\W/p)
>> +    {
>> +        my $dir = ${^PREMATCH};
>> +        $portdir = "$dir/portlock" if $dir;
>> +    }
>> +    # As a last resort use a directory under tmp_check
>> +    $portdir ||= $PostgreSQL::Test::Utils::tmp_check . '/portlock';
>> +    $portdir =~ s!\\!/!g;
>> +    # Make sure the directory exists
>> +    mkpath($portdir) unless -d $portdir;
>>  }
> Perhaps we should just export a directory in configure instead of this
> guessing game?
>
>
>

I think the obvious candidate would be to export top_builddir from
src/Makefile.global. That would remove the need to infer it from
TESTDATADIR.


Any objections?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
Hi,

On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
> > Perhaps we should just export a directory in configure instead of this
> > guessing game?
> 
> I think the obvious candidate would be to export top_builddir from
> src/Makefile.global. That would remove the need to infer it from
> TESTDATADIR.

I think that'd be good. I'd perhaps rename it in the process so it's
exported uppercase, but whatever...

Greetings,

Andres Freund



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-19 Sa 15:16, Andres Freund wrote:
> Hi,
>
> On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
>>> Perhaps we should just export a directory in configure instead of this
>>> guessing game?
>> I think the obvious candidate would be to export top_builddir from
>> src/Makefile.global. That would remove the need to infer it from
>> TESTDATADIR.
> I think that'd be good. I'd perhaps rename it in the process so it's
> exported uppercase, but whatever...
>

OK, pushed with a little more tweaking. I didn't upcase top_builddir
because the existing prove_installcheck recipes already export it and I
wanted to stay consistent with those.

If it works ok I will backpatch in couple of days.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
On 2022-11-20 10:10:38 -0500, Andrew Dunstan wrote:
> OK, pushed with a little more tweaking.

Thanks!


> I didn't upcase top_builddir
> because the existing prove_installcheck recipes already export it and I
> wanted to stay consistent with those.

Makes sense.


> If it works ok I will backpatch in couple of days.

+1



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-20 Su 14:05, Andres Freund wrote:
>> If it works ok I will backpatch in couple of days.
> +1



Done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andres Freund
Date:
Hi,

On 2022-11-22 10:57:41 -0500, Andrew Dunstan wrote:
> On 2022-11-20 Su 14:05, Andres Freund wrote:
> >> If it works ok I will backpatch in couple of days.
> > +1
> Done.

While looking into a weird buildfarm failure ([1]), I noticed this:

# Checking port 62707
Use of uninitialized value $pid in scalar chomp at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1247.
 
Use of uninitialized value $pid in addition (+) at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1248.
 

This isn't related the failure afaics. I think it's happening for all runs on
all branches on my host. And also a few other animals [2].

Not quite sure how $pid ends up uninitialized, given the code:
    # see if someone else has or had a reservation of this port
    my $pid = <$portfile>;
    chomp $pid;
    if ($pid +0 > 0)

Greetings,

Andres Freund


[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2022-11-22%2016%3A33%3A57

The main symptom is
# Running: pg_ctl -D
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/src/bin/pg_ctl/tmp_check/t_003_promote_standby2_data/pgdata
promote
waiting for server to promote....
pg_ctl: control file appears to be corrupt

[2]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=peripatus&dt=2022-11-23%2000%3A20%3A13&stg=pg_ctl-check



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> While looking into a weird buildfarm failure ([1]), I noticed this:

> # Checking port 62707
> Use of uninitialized value $pid in scalar chomp at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1247. 
> Use of uninitialized value $pid in addition (+) at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1248. 

Yeah, my animals are showing that too.

> Not quite sure how $pid ends up uninitialized, given the code:
>     # see if someone else has or had a reservation of this port
>     my $pid = <$portfile>;
>     chomp $pid;
>     if ($pid +0 > 0)

I guess the <$portfile> might return undef if the file is empty?

            regards, tom lane



Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
> On Nov 22, 2022, at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
>> While looking into a weird buildfarm failure ([1]), I noticed this:
>
>> # Checking port 62707
>> Use of uninitialized value $pid in scalar chomp at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1247. 
>> Use of uninitialized value $pid in addition (+) at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1248. 
>
> Yeah, my animals are showing that too.
>
>> Not quite sure how $pid ends up uninitialized, given the code:
>>    # see if someone else has or had a reservation of this port
>>    my $pid = <$portfile>;
>>    chomp $pid;
>>    if ($pid +0 > 0)
>
> I guess the <$portfile> might return undef if the file is empty?
>

Probably, will fix in the morning

Cheers

Andrew


Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:
On 2022-11-22 Tu 20:36, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> While looking into a weird buildfarm failure ([1]), I noticed this:
>> # Checking port 62707
>> Use of uninitialized value $pid in scalar chomp at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1247.
 
>> Use of uninitialized value $pid in addition (+) at
/mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pmline 1248.
 
> Yeah, my animals are showing that too.
>
>> Not quite sure how $pid ends up uninitialized, given the code:
>>     # see if someone else has or had a reservation of this port
>>     my $pid = <$portfile>;
>>     chomp $pid;
>>     if ($pid +0 > 0)
> I guess the <$portfile> might return undef if the file is empty?
>
>         


Yeah, should be fixed now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Peter Eisentraut
Date:
On 20.11.22 16:10, Andrew Dunstan wrote:
> 
> On 2022-11-19 Sa 15:16, Andres Freund wrote:
>> Hi,
>>
>> On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
>>>> Perhaps we should just export a directory in configure instead of this
>>>> guessing game?
>>> I think the obvious candidate would be to export top_builddir from
>>> src/Makefile.global. That would remove the need to infer it from
>>> TESTDATADIR.
>> I think that'd be good. I'd perhaps rename it in the process so it's
>> exported uppercase, but whatever...
>>
> 
> OK, pushed with a little more tweaking. I didn't upcase top_builddir
> because the existing prove_installcheck recipes already export it and I
> wanted to stay consistent with those.
> 
> If it works ok I will backpatch in couple of days.

These patches have affected pgxs-using extensions that have their own 
TAP tests.

The portlock directory is created at

     my $build_dir = $ENV{top_builddir}
       || $PostgreSQL::Test::Utils::tmp_check ;
     $portdir ||= "$build_dir/portlock";

but for a pgxs user, top_builddir points into the installation tree, 
specifically at $prefix/lib/pgxs/.

So when running "make installcheck" for an extension, we either won't 
have write access to that directory, or if we do, then it's still not 
good to write into the installation tree during a test suite.

A possible fix is

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 5dacc4d838..c493d1a60c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
  $(MKDIR_P) '$(CURDIR)'/tmp_check && \
  cd $(srcdir) && \
     TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
-   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
+   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
     PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
     $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
$(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
  endef




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Peter Eisentraut
Date:
On 25.04.23 12:27, Peter Eisentraut wrote:
> These patches have affected pgxs-using extensions that have their own 
> TAP tests.
> 
> The portlock directory is created at
> 
>      my $build_dir = $ENV{top_builddir}
>        || $PostgreSQL::Test::Utils::tmp_check ;
>      $portdir ||= "$build_dir/portlock";
> 
> but for a pgxs user, top_builddir points into the installation tree, 
> specifically at $prefix/lib/pgxs/.
> 
> So when running "make installcheck" for an extension, we either won't 
> have write access to that directory, or if we do, then it's still not 
> good to write into the installation tree during a test suite.
> 
> A possible fix is
> 
> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
> index 5dacc4d838..c493d1a60c 100644
> --- a/src/Makefile.global.in
> +++ b/src/Makefile.global.in
> @@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
>   $(MKDIR_P) '$(CURDIR)'/tmp_check && \
>   cd $(srcdir) && \
>      TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
> -   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
> +   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
>      PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
>      $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
> $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
>   endef

Any thoughts on this?  I would like to get this into the upcoming minor 
releases.

Note that the piece of code shown here is only applicable to PGXS, so 
given that that is currently broken, this "can't make it worse".




Re: ssl tests aren't concurrency safe due to get_free_port()

From
Peter Eisentraut
Date:
On 25.04.23 12:27, Peter Eisentraut wrote:
> On 20.11.22 16:10, Andrew Dunstan wrote:
>>
>> On 2022-11-19 Sa 15:16, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
>>>>> Perhaps we should just export a directory in configure instead of this
>>>>> guessing game?
>>>> I think the obvious candidate would be to export top_builddir from
>>>> src/Makefile.global. That would remove the need to infer it from
>>>> TESTDATADIR.
>>> I think that'd be good. I'd perhaps rename it in the process so it's
>>> exported uppercase, but whatever...
>>>
>>
>> OK, pushed with a little more tweaking. I didn't upcase top_builddir
>> because the existing prove_installcheck recipes already export it and I
>> wanted to stay consistent with those.
>>
>> If it works ok I will backpatch in couple of days.
> 
> These patches have affected pgxs-using extensions that have their own 
> TAP tests.
> 
> The portlock directory is created at
> 
>      my $build_dir = $ENV{top_builddir}
>        || $PostgreSQL::Test::Utils::tmp_check ;
>      $portdir ||= "$build_dir/portlock";
> 
> but for a pgxs user, top_builddir points into the installation tree, 
> specifically at $prefix/lib/pgxs/.
> 
> So when running "make installcheck" for an extension, we either won't 
> have write access to that directory, or if we do, then it's still not 
> good to write into the installation tree during a test suite.
> 
> A possible fix is
> 
> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
> index 5dacc4d838..c493d1a60c 100644
> --- a/src/Makefile.global.in
> +++ b/src/Makefile.global.in
> @@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
>   $(MKDIR_P) '$(CURDIR)'/tmp_check && \
>   cd $(srcdir) && \
>      TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
> -   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
> +   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
>      PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
>      $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
> $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
>   endef

Better idea: Remove the top_builddir assignment altogether.  I traced 
the history of this, and it seems like it was just dragged around with 
various other changes and doesn't have a purpose of its own.

The only effect of the current code (top_builddir='$(top_builddir)') is 
to export top_builddir as an environment variable.  And the only Perl 
test code that reads that environment variable is the code that makes 
the portlock directory, which is exactly what we don't want.  So just 
removing that seems to be the right solution.

Attachment

Re: ssl tests aren't concurrency safe due to get_free_port()

From
Andrew Dunstan
Date:


On 2023-05-04 Th 02:40, Peter Eisentraut wrote:
On 25.04.23 12:27, Peter Eisentraut wrote:
On 20.11.22 16:10, Andrew Dunstan wrote:

On 2022-11-19 Sa 15:16, Andres Freund wrote:
Hi,

On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
Perhaps we should just export a directory in configure instead of this
guessing game?
I think the obvious candidate would be to export top_builddir from
src/Makefile.global. That would remove the need to infer it from
TESTDATADIR.
I think that'd be good. I'd perhaps rename it in the process so it's
exported uppercase, but whatever...


OK, pushed with a little more tweaking. I didn't upcase top_builddir
because the existing prove_installcheck recipes already export it and I
wanted to stay consistent with those.

If it works ok I will backpatch in couple of days.

These patches have affected pgxs-using extensions that have their own TAP tests.

The portlock directory is created at

     my $build_dir = $ENV{top_builddir}
       || $PostgreSQL::Test::Utils::tmp_check ;
     $portdir ||= "$build_dir/portlock";

but for a pgxs user, top_builddir points into the installation tree, specifically at $prefix/lib/pgxs/.

So when running "make installcheck" for an extension, we either won't have write access to that directory, or if we do, then it's still not good to write into the installation tree during a test suite.

A possible fix is

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 5dacc4d838..c493d1a60c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
  $(MKDIR_P) '$(CURDIR)'/tmp_check && \
  cd $(srcdir) && \
     TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
-   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
+   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
     PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
     $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
  endef

Better idea: Remove the top_builddir assignment altogether.  I traced the history of this, and it seems like it was just dragged around with various other changes and doesn't have a purpose of its own.

The only effect of the current code (top_builddir='$(top_builddir)') is to export top_builddir as an environment variable.  And the only Perl test code that reads that environment variable is the code that makes the portlock directory, which is exactly what we don't want.  So just removing that seems to be the right solution.



Yeah, that should be OK in the pgxs case.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ssl tests aren't concurrency safe due to get_free_port()

From
Peter Eisentraut
Date:
On 05.05.23 00:33, Andrew Dunstan wrote:
> 
> On 2023-05-04 Th 02:40, Peter Eisentraut wrote:
>> On 25.04.23 12:27, Peter Eisentraut wrote:
>>> On 20.11.22 16:10, Andrew Dunstan wrote:
>>>>
>>>> On 2022-11-19 Sa 15:16, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> On 2022-11-19 10:56:33 -0500, Andrew Dunstan wrote:
>>>>>>> Perhaps we should just export a directory in configure instead of 
>>>>>>> this
>>>>>>> guessing game?
>>>>>> I think the obvious candidate would be to export top_builddir from
>>>>>> src/Makefile.global. That would remove the need to infer it from
>>>>>> TESTDATADIR.
>>>>> I think that'd be good. I'd perhaps rename it in the process so it's
>>>>> exported uppercase, but whatever...
>>>>>
>>>>
>>>> OK, pushed with a little more tweaking. I didn't upcase top_builddir
>>>> because the existing prove_installcheck recipes already export it and I
>>>> wanted to stay consistent with those.
>>>>
>>>> If it works ok I will backpatch in couple of days.
>>>
>>> These patches have affected pgxs-using extensions that have their own 
>>> TAP tests.
>>>
>>> The portlock directory is created at
>>>
>>>      my $build_dir = $ENV{top_builddir}
>>>        || $PostgreSQL::Test::Utils::tmp_check ;
>>>      $portdir ||= "$build_dir/portlock";
>>>
>>> but for a pgxs user, top_builddir points into the installation tree, 
>>> specifically at $prefix/lib/pgxs/.
>>>
>>> So when running "make installcheck" for an extension, we either won't 
>>> have write access to that directory, or if we do, then it's still not 
>>> good to write into the installation tree during a test suite.
>>>
>>> A possible fix is
>>>
>>> diff --git a/src/Makefile.global.in b/src/Makefile.global.in
>>> index 5dacc4d838..c493d1a60c 100644
>>> --- a/src/Makefile.global.in
>>> +++ b/src/Makefile.global.in
>>> @@ -464,7 +464,7 @@ rm -rf '$(CURDIR)'/tmp_check && \
>>>   $(MKDIR_P) '$(CURDIR)'/tmp_check && \
>>>   cd $(srcdir) && \
>>>      TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
>>> -   PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
>>> +   PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)' \
>>>      PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
>>>      $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if 
>>> $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
>>>   endef
>>
>> Better idea: Remove the top_builddir assignment altogether.  I traced 
>> the history of this, and it seems like it was just dragged around with 
>> various other changes and doesn't have a purpose of its own.
>>
>> The only effect of the current code (top_builddir='$(top_builddir)') 
>> is to export top_builddir as an environment variable.  And the only 
>> Perl test code that reads that environment variable is the code that 
>> makes the portlock directory, which is exactly what we don't want.  So 
>> just removing that seems to be the right solution.
> 
> Yeah, that should be OK in the pgxs case.

I have committed this to all active branches.