Thread: port conflicts when running tests concurrently on windows.

port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

With the meson patch applied the tests on windows run concurrently by
default. Unfortunately that fails semi-regularly. The reason for this
basically is that windows defaults to using TCP in tests, and that the
tap-test port determination is very racy:

>       # When selecting a port, we look for an unassigned TCP port number,
>       # even if we intend to use only Unix-domain sockets.  This is clearly
>       # necessary on $use_tcp (Windows) configurations, and it seems like a
>       # good idea on Unixen as well.
>       $port = get_free_port();
> ...
> =item get_free_port()
>
> Locate an unprivileged (high) TCP port that's not currently bound to
> anything.  This is used by C<new()>, and also by some test cases that need to
> start other, non-Postgres servers.
>
> Ports assigned to existing PostgreSQL::Test::Cluster objects are automatically
> excluded, even if those servers are not currently running.
>
> XXX A port available now may become unavailable by the time we start
> the desired service.

I don't think there's an easy way to make this race-free. We'd need to teach
postmaster to use pre-opened socket or something like that.

An alternative to that would be to specify a base port number externally. In
the meson branch I already did that for the pg_regress style tests, since they
don't have the automatic port thing above. But for tap tests there's currently
no way to pass in a base-port that I can see.


Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
> Is it perhaps time to to use unix sockets on windows by default
> (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?

On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
that for PG_REGRESS. But the tap tests don't look at that :(.

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

On 2021-12-08 16:36:14 -0800, Andres Freund wrote:
> On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
> > Is it perhaps time to to use unix sockets on windows by default
> > (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
> 
> On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
> trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
> that for PG_REGRESS. But the tap tests don't look at that :(.

The tap failures in turn are caused by the Cluster.pm choosing a socket
directory with backslashes. Those backslashes are then treated as an escape
character both by guc.c and libpq.

I think this can be addressed by something like

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 9467a199c8f..c2a8487bbab 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -119,7 +119,15 @@ INIT
     $use_tcp            = !$PostgreSQL::Test::Utils::use_unix_sockets;
     $test_localhost     = "127.0.0.1";
     $last_host_assigned = 1;
-    $test_pghost        = $use_tcp ? $test_localhost : PostgreSQL::Test::Utils::tempdir_short;
+    if ($use_tcp)
+    {
+        $test_pghost = $test_localhost;
+    }
+    else
+    {
+        $test_pghost = PostgreSQL::Test::Utils::tempdir_short;
+        $test_pghost =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
+    }
     $ENV{PGHOST}        = $test_pghost;
     $ENV{PGDATABASE}    = 'postgres';
 

I wonder if we need a host2unix() helper accompanying perl2host()? Seems nicer
than sprinkling s!\\!/!g if $PostgreSQL::Test::Utils::windows_os in a growing
number of places...

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

On 2021-12-08 17:03:07 -0800, Andres Freund wrote:
> On 2021-12-08 16:36:14 -0800, Andres Freund wrote:
> > On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
> > > Is it perhaps time to to use unix sockets on windows by default
> > > (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
> > 
> > On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
> > trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
> > that for PG_REGRESS. But the tap tests don't look at that :(.
> 
> The tap failures in turn are caused by the Cluster.pm choosing a socket
> directory with backslashes. Those backslashes are then treated as an escape
> character both by guc.c and libpq.
> 
> I think this can be addressed by something like
> [...]

That indeed seems to pass [1] where it previously failed [2]. There's another
failure that I haven't diagnosed yet, but it's independent of tcp vs unix sockets.

[1] https://cirrus-ci.com/task/5055530235330560?logs=ssl_test#L5
[2] https://cirrus-ci.com/task/5524596901281792?logs=ssl_test#L5

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Thomas Munro
Date:
On Thu, Dec 9, 2021 at 11:46 AM Andres Freund <andres@anarazel.de> wrote:
> Is it perhaps time to to use unix sockets on windows by default
> (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?

Makes sense.  As a data point, it looks like this feature is in all
supported releases of Windows.  It arrived in 1803, already EOL'd, and
IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
from the internet and refusing all updates reaches "mainstream EOL"
next month.  (Not a Windows person myself, but I've been looking at
this stuff while contemplating various filesystem-related changes...
there it's a little murkier, you need a WSL1-era kernel *and* you need
to be running on top of local NTFS, which is harder for us to expect.)

https://docs.microsoft.com/en-us/lifecycle/products/windows-10-enterprise-and-education
https://docs.microsoft.com/en-us/windows-server/get-started/windows-server-release-info
https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions



Re: port conflicts when running tests concurrently on windows.

From
Peter Eisentraut
Date:
On 09.12.21 03:44, Thomas Munro wrote:
> On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de>  wrote:
>> Is it perhaps time to to use unix sockets on windows by default
>> (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?

Makes sense to get this to work, at least as an option.

> Makes sense.  As a data point, it looks like this feature is in all
> supported releases of Windows.  It arrived in 1803, already EOL'd, and
> IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
> from the internet and refusing all updates reaches "mainstream EOL"
> next month.

I believe the "18" in "1803" refers to 2018.  We have Windows buildfarm 
members that mention 2016 and 2017 in their title.  Would those be in 
trouble?



Re: port conflicts when running tests concurrently on windows.

From
Andrew Dunstan
Date:
On 12/9/21 08:35, Peter Eisentraut wrote:
> On 09.12.21 03:44, Thomas Munro wrote:
>> On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de> 
>> wrote:
>>> Is it perhaps time to to use unix sockets on windows by default
>>> (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
>
> Makes sense to get this to work, at least as an option.
>
>> Makes sense.  As a data point, it looks like this feature is in all
>> supported releases of Windows.  It arrived in 1803, already EOL'd, and
>> IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
>> from the internet and refusing all updates reaches "mainstream EOL"
>> next month.
>
> I believe the "18" in "1803" refers to 2018.  We have Windows
> buildfarm members that mention 2016 and 2017 in their title.  Would
> those be in trouble?


Probably not if they have been updated. I have Windows machines
substantially older that 2018 but now running versions dated later.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: port conflicts when running tests concurrently on windows.

From
Andrew Dunstan
Date:
On 12/8/21 20:03, Andres Freund wrote:
>
> I wonder if we need a host2unix() helper accompanying perl2host()? Seems nicer
> than sprinkling s!\\!/!g if $PostgreSQL::Test::Utils::windows_os in a growing
> number of places...
>

Probably a good idea. I would call it canonical_path or something like
that. / works quite happily as a path separator in almost all contexts
on Windows - there are a handful of command line programs that choke on
it - but sometimes you need to quote the path. WHen I recently provided
for cross version upgrade testing on MSVC builds I just quoted
everything. On Unix/Msys the shell just removes the quotes.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

On 2021-12-09 14:35:37 +0100, Peter Eisentraut wrote:
> On 09.12.21 03:44, Thomas Munro wrote:
> > On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de>  wrote:
> > > Is it perhaps time to to use unix sockets on windows by default
> > > (i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
> 
> Makes sense to get this to work, at least as an option.

With https://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
TMPDIR, but windows only has TMP and TEMP.


> > Makes sense.  As a data point, it looks like this feature is in all
> > supported releases of Windows.  It arrived in 1803, already EOL'd, and
> > IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
> > from the internet and refusing all updates reaches "mainstream EOL"
> > next month.
> 
> I believe the "18" in "1803" refers to 2018.  We have Windows buildfarm
> members that mention 2016 and 2017 in their title.  Would those be in
> trouble?

Perhaps it could make sense to print the windows version somewhere as part of
a windows build? Perhaps in the buildfarm client? Seems like it could be
generally useful, outside of this specific issue.

The most minimal thing would be to just print cmd /c ver or
such. systeminfo.exe output could also be useful, but has a bit of runtime
(1.5s on my windows VM).

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Peter Eisentraut
Date:
On 09.12.21 19:41, Andres Freund wrote:
> Withhttps://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
> all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
> TMPDIR, but windows only has TMP and TEMP.

This looks reasonable so far.  The commit messages 
8f3ec75de4060d86176ad4ac998eeb87a39748c2 and 
1d53432ff940b789c2431ba476a2a6e2db3edf84 contain some notes about what I 
thought at the time didn't work yet.  In particular, the pg_upgrade 
tests don't support the use of Unix sockets on Windows.  (Those tests 
have been rewritten since, so I don't know what the status is.)  Also, 
the comment in pg_regress.c at remove_temp() is still there.  These 
things should probably be addressed before we can consider making this 
the default.



Re: port conflicts when running tests concurrently on windows.

From
Andres Freund
Date:
Hi,

On 2021-12-10 10:22:13 +0100, Peter Eisentraut wrote:
> On 09.12.21 19:41, Andres Freund wrote:
> > Withhttps://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
> > all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
> > TMPDIR, but windows only has TMP and TEMP.
> 
> This looks reasonable so far.

I pushed that part, since we clearly need something like them.


> The commit messages 8f3ec75de4060d86176ad4ac998eeb87a39748c2 and
> 1d53432ff940b789c2431ba476a2a6e2db3edf84 contain some notes about what I
> thought at the time didn't work yet.

> In particular, the pg_upgrade tests
> don't support the use of Unix sockets on Windows.  (Those tests have been
> rewritten since, so I don't know what the status is.)

ISTM we still use two different implementations of the pg_upgrade tests :(. I
recall there being some recent-ish work on moving it to be a tap test, but
apparently not yet committed.

It doesn't look like the vcregress.pl implementation respects
PG_TEST_USE_UNIX_SOCKETS right now.


> pg_regress.c at remove_temp() is still there.  These things should probably
> be addressed before we can consider making this the default.

Hm, not immediately obvious what to do about this. Do you know if windows has
restrictions around the length of unix domain sockets? If not, I wonder if it
could be worth using the data directory as the socket path on windows instead
of the separate temp directory?

Greetings,

Andres Freund



Re: port conflicts when running tests concurrently on windows.

From
Peter Eisentraut
Date:
On 13.12.21 20:33, Andres Freund wrote:
>> pg_regress.c at remove_temp() is still there.  These things should probably
>> be addressed before we can consider making this the default.
> 
> Hm, not immediately obvious what to do about this. Do you know if windows has
> restrictions around the length of unix domain sockets? If not, I wonder if it
> could be worth using the data directory as the socket path on windows instead
> of the separate temp directory?

According to src/include/port/win32.h, it's the same 108 or so bytes 
that everyone else uses.  That might not be the kernel limit, however, 
just the way the external API is defined.

After the reading the material again, I think that comment might be 
overly cautious.  The list of things not to do in a signal handler on 
Windows is not significantly different than those for say Linux, yet we 
do a lot of them anyway.  I'm tempted to just ignore the advice and do 
it anyway, while ignoring errors, which is what it already does.