Re: ssl tests aren't concurrency safe due to get_free_port() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: ssl tests aren't concurrency safe due to get_free_port()
Date
Msg-id 20221116015105.g6qxu2ugpj6agt26@awork3.anarazel.de
Whole thread Raw
In response to Re: ssl tests aren't concurrency safe due to get_free_port()  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: ssl tests aren't concurrency safe due to get_free_port()
List pgsql-hackers
Hi,

On 2022-11-15 15:56:37 -0500, Andrew Dunstan wrote:
> On 2022-11-06 Su 11:30, Andrew Dunstan wrote:
> >
> > One possible addition would be to add removing the reservation files in
> > an END handler. That would be pretty simple.
> >
> >
> 
> 
> Here's a version with that. I suggest we try it out and see if anything
> breaks.

Thanks! I agree it makes sense to go ahead with this.

I'd guess we should test drive this a bit in HEAD but eventually backpatch?


> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index d80134b26f..85fae32c14 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm

I think this should also update the comment for get_free_port(), since the
race mentioned there is largely addressed by this patch?


> @@ -140,6 +143,27 @@ INIT
>  
>      # Tracking of last port value assigned to accelerate free port lookup.
>      $last_port_assigned = int(rand() * 16384) + 49152;
> +
> +    # Set the port lock directory
> +
> +    # If we're told to use a directory (e.g. from a buildfarm client)
> +    # explicitly, use that
> +    $portdir = $ENV{PG_TEST_PORT_DIR};
> +    # Otherwise, try to use a directory at the top of the build tree
> +    if (! $portdir && $ENV{MESON_BUILD_ROOT})
> +    {
> +        $portdir = $ENV{MESON_BUILD_ROOT} . '/portlock'
> +    }
> +    elsif (! $portdir && ($ENV{TESTDATADIR} || "") =~ /\W(src|contrib)\W/p)
> +    {
> +        my $dir = ${^PREMATCH};
> +        $portdir = "$dir/portlock" if $dir;
> +    }
> +    # As a last resort use a directory under tmp_check
> +    $portdir ||= $PostgreSQL::Test::Utils::tmp_check . '/portlock';
> +    $portdir =~ s!\\!/!g;
> +    # Make sure the directory exists
> +    mkpath($portdir) unless -d $portdir;
>  }

Perhaps we should just export a directory in configure instead of this
guessing game?


>  =pod
> @@ -1505,6 +1529,7 @@ sub get_free_port
>                      last;
>                  }
>              }
> +            $found = _reserve_port($port) if $found;
>          }
>      }
>  
> @@ -1535,6 +1560,40 @@ sub can_bind
>      return $ret;
>  }
>  
> +# Internal routine to reserve a port number
> +# Returns 1 if successful, 0 if port is already reserved.
> +sub _reserve_port
> +{
> +    my $port = shift;
> +    # open in rw mode so we don't have to reopen it and lose the lock
> +    my $filename = "$portdir/$port.rsv";
> +    sysopen(my $portfile, $filename, O_RDWR|O_CREAT)
> +      || die "opening port file $filename";

Perhaps add $! to the message so e.g. permission denied errors or such are
easier to debug?


> +    # take an exclusive lock to avoid concurrent access
> +    flock($portfile, LOCK_EX) || die "locking port file $filename";

dito


> +    # see if someone else has or had a reservation of this port
> +    my $pid = <$portfile>;
> +    chomp $pid;
> +    if ($pid +0 > 0)
> +    {
> +        if (kill 0, $pid)
> +        {
> +            # process exists and is owned by us, so we can't reserve this port
> +            close($portfile);
> +            return 0;
> +        }
> +    }
> +    # All good, go ahead and reserve the port, first rewind and truncate.
> +    # If truncation fails it's not a tragedy, it just might leave some
> +    # trailing junk in the file that won't affect us.
> +    seek($portfile, 0, SEEK_SET);
> +    truncate($portfile, 0);

Perhaps check truncate's return value?


> +    print $portfile "$$\n";
> +    close($portfile);
> +    push(@port_reservation_files, $filename);
> +    return 1;
> +}

Perhaps it'd be better to release the file lock explicitly? flock() has this
annoying behaviour of only releasing the lock when the last file descriptor
for a file is closed. We shouldn't end up with dup'd FDs or forks here, but it
seems like it might be more robust to just explicitly release the lock?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [DOCS] Stats views and functions not in order?
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: Documentation for building with meson