Re: Sometimes the output to the stdout in Windows disappears - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Sometimes the output to the stdout in Windows disappears
Date
Msg-id ee02eaa2-03f7-74ea-bbdf-3196e506bae3@gmail.com
Whole thread Raw
In response to Re: Sometimes the output to the stdout in Windows disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Sometimes the output to the stdout in Windows disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello hackers,

13.09.2020 21:37, Tom Lane wrote:
> I happened to try googling for other similar reports, and I found
> a very interesting recent thread here:
>
> https://github.com/nodejs/node/issues/33166
>
> It might not have the same underlying cause, of course, but it sure
> sounds familiar.  If Node.js are really seeing the same effect,
> that would point to an underlying Windows bug rather than anything
> Postgres is doing wrong.
>
> It doesn't look like the Node.js crew got any closer to
> understanding the issue than we have, unfortunately.  They made
> their problem mostly go away by reverting a seemingly-unrelated
> patch.  But I can't help thinking that it's a timing-related bug,
> and that patch was just unlucky enough to change the timing of
> their tests so that they saw the failure frequently.
I've managed to make a simple reproducer. Please look at the patch attached.
There are two things crucial for reproducing the bug:
    ioctlsocket(sock, FIONBIO, &ioctlsocket_ret); // from pgwin32_socket()
and
    WSACleanup();

I still can't understand what affects the effect. With this reproducer I
get:
vcregress taptest src\test\modules\connect
...
t/000_connect.pl .. # test
#
t/000_connect.pl .. 13346/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 16714/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 26216/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 30077/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 36505/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 43647/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 53070/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 54402/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 55685/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 83193/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 99992/100000 # Looks like you failed 10 tests of 100000.
t/000_connect.pl .. Dubious, test returned 10 (wstat 2560, 0xa00)
Failed 10/100000 subtests

But in our test farm the pg_bench test (from the installcheck-world
suite that we run with using msys) can fail roughly on each third run.
Perhaps it depends on I/O load. It seems, that searching files/scanning
disk in parallel increases the probability of the glitch.
I see no solution for this on the postgres side for now, but this
information about Windows quirks could be useful in case someone
stumbled upon it too.

Best regards,
Alexander

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Potential use of uninitialized context in pgcrypto
Next
From: Stephen Frost
Date:
Subject: Re: [Patch] Using Windows groups for SSPI authentication