Thread: [HACKERS] MSVC odd TAP test problem

[HACKERS] MSVC odd TAP test problem

From
Andrew Dunstan
Date:
I have been working on enabling the remaining TAP tests on MSVC build in
the buildfarm client, but I have come across an odd problem. The bin
tests all run fine, but the recover tests crash and in such a way as to
crash the buildfarm client itself and require some manual cleanup. This
happens at some stage after the tests have run (the final "ok" is
output) but before the END handler in PostgresNode.pm (I put some traces
in there to see if I could narrow down where there were problems).

The symptom is that this appears at the end of the output when the
client calls "vcregress.pl taptest src/test/recover":
   Terminating on signal SIGBREAK(21)   Terminating on signal SIGBREAK(21)   Terminate batch job (Y/N)?

And at that point there is nothing at all apparently running, according
to Sysinternals Process Explorer, including the buildfarm client.

It's 100% repeatable on bowerbird, and I'm a bit puzzled about how to
fix it.


Anyone have any clues?


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] MSVC odd TAP test problem

From
Craig Ringer
Date:


On 7 May 2017 4:24 am, "Andrew Dunstan" <andrew.dunstan@2ndquadrant.com> wrote:

I have been working on enabling the remaining TAP tests on MSVC build in
the buildfarm client, but I have come across an odd problem. The bin
tests all run fine, but the recover tests crash and in such a way as to
crash the buildfarm client itself and require some manual cleanup. This
happens at some stage after the tests have run (the final "ok" is
output) but before the END handler in PostgresNode.pm (I put some traces
in there to see if I could narrow down where there were problems).

The symptom is that this appears at the end of the output when the
client calls "vcregress.pl taptest src/test/recover":

    Terminating on signal SIGBREAK(21)
    Terminating on signal SIGBREAK(21)
    Terminate batch job (Y/N)?

And at that point there is nothing at all apparently running, according
to Sysinternals Process Explorer, including the buildfarm client.

It's 100% repeatable on bowerbird, and I'm a bit puzzled about how to
fix it.


Anyone have any clues?

That looks like we've upset CMD.exe its self. I'm not sure how ... leaking a signal to the parent proc?

I suspect this could be something to do with console process groups.

Bowerbird is win8 . So this isn't going to be related to the support for ANSI escapes added in win10.

A serach for the error turns up a complaint about IPC::Run as the first hit. Probably not coincidence.



See this bug




Re: [HACKERS] MSVC odd TAP test problem

From
Andrew Dunstan
Date:

On 05/06/2017 07:41 PM, Craig Ringer wrote:
>
>
> On 7 May 2017 4:24 am, "Andrew Dunstan"
> <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>> wrote:
>
>
>     I have been working on enabling the remaining TAP tests on MSVC
>     build in
>     the buildfarm client, but I have come across an odd problem. The bin
>     tests all run fine, but the recover tests crash and in such a way
>     as to
>     crash the buildfarm client itself and require some manual cleanup.
>     This
>     happens at some stage after the tests have run (the final "ok" is
>     output) but before the END handler in PostgresNode.pm (I put some
>     traces
>     in there to see if I could narrow down where there were problems).
>
>     The symptom is that this appears at the end of the output when the
>     client calls "vcregress.pl <http://vcregress.pl> taptest
>     src/test/recover":
>
>         Terminating on signal SIGBREAK(21)
>         Terminating on signal SIGBREAK(21)
>         Terminate batch job (Y/N)?
>
>     And at that point there is nothing at all apparently running,
>     according
>     to Sysinternals Process Explorer, including the buildfarm client.
>
>     It's 100% repeatable on bowerbird, and I'm a bit puzzled about how to
>     fix it.
>
>
>     Anyone have any clues?
>
>
> That looks like we've upset CMD.exe its self. I'm not sure how ...
> leaking a signal to the parent proc?
>
> I suspect this could be something to do with console process groups.
>
> Bowerbird is win8 . So this isn't going to be related to the support
> for ANSI escapes added in win10.
>
> A serach for the error turns up a complaint about IPC::Run as the
> first hit. Probably not coincidence.
>
>
> http://stackoverflow.com/q/40924750
>
> See this bug
>
> https://rt.cpan.org/Public/Bug/Display.html?id=101093
>
>
>



Actually, it's Win10, looks like I forgot to update the personality, my bad.

I had a feeling it was probably something to do with timeout. That RT
ticket looks like it's on the money.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] MSVC odd TAP test problem

From
Andrew Dunstan
Date:

On 05/06/2017 08:54 PM, Andrew Dunstan wrote:
>
> On 05/06/2017 07:41 PM, Craig Ringer wrote:
>>
>> On 7 May 2017 4:24 am, "Andrew Dunstan"
>> <andrew.dunstan@2ndquadrant.com
>> <mailto:andrew.dunstan@2ndquadrant.com>> wrote:
>>
>>
>>     I have been working on enabling the remaining TAP tests on MSVC
>>     build in
>>     the buildfarm client, but I have come across an odd problem. The bin
>>     tests all run fine, but the recover tests crash and in such a way
>>     as to
>>     crash the buildfarm client itself and require some manual cleanup.
>>     This
>>     happens at some stage after the tests have run (the final "ok" is
>>     output) but before the END handler in PostgresNode.pm (I put some
>>     traces
>>     in there to see if I could narrow down where there were problems).
>>
>>     The symptom is that this appears at the end of the output when the
>>     client calls "vcregress.pl <http://vcregress.pl> taptest
>>     src/test/recover":
>>
>>         Terminating on signal SIGBREAK(21)
>>         Terminating on signal SIGBREAK(21)
>>         Terminate batch job (Y/N)?
>>
>>     And at that point there is nothing at all apparently running,
>>     according
>>     to Sysinternals Process Explorer, including the buildfarm client.
>>
>>     It's 100% repeatable on bowerbird, and I'm a bit puzzled about how to
>>     fix it.
>>
>>
>>     Anyone have any clues?
>>
>>
>> That looks like we've upset CMD.exe its self. I'm not sure how ...
>> leaking a signal to the parent proc?
>>
>> I suspect this could be something to do with console process groups.
>>
>> Bowerbird is win8 . So this isn't going to be related to the support
>> for ANSI escapes added in win10.
>>
>> A serach for the error turns up a complaint about IPC::Run as the
>> first hit. Probably not coincidence.
>>
>>
>> http://stackoverflow.com/q/40924750
>>
>> See this bug
>>
>> https://rt.cpan.org/Public/Bug/Display.html?id=101093
>>
>>
>>
>
>
> Actually, it's Win10, looks like I forgot to update the personality, my bad.
>
> I had a feeling it was probably something to do with timeout. That RT
> ticket looks like it's on the money.
>



(After extensive trial and error) Turns out it's not quite that, it's
the kill_kill stuff. I think for now we should just disable it on the
platform. That means not running tests 7 and 8 of the logical_decoding
tests and all of the crash_recovery test. test::More has nice
faciliti4es for skipping tests cleanly. See attached patch.

cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] MSVC odd TAP test problem

From
Michael Paquier
Date:
On Wed, May 10, 2017 at 2:11 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> (After extensive trial and error) Turns out it's not quite that, it's
> the kill_kill stuff. I think for now we should just disable it on the
> platform. That means not running tests 7 and 8 of the logical_decoding
> tests and all of the crash_recovery test. test::More has nice
> faciliti4es for skipping tests cleanly. See attached patch.

+SKIP:
+{
+    # some Windows Perls at least don't like IPC::Run's start/kill_kill regime.
+    skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32';
So this basically works with msys but not with MSWin32? Interesting...

Does it make a different if you use for example coup_d_grace =>
"QUIT"? Per the docs of IPC::Run SIGTERM is used for kills on Windows.

+if  ($Config{osname} eq 'MSWin32')
+{
+    # some Windows Perls at least don't like IPC::Run's start/kill_kill regime.
+    plan skip_all => "Test fails on Windows perl";
+}
Indentation is weird here, with a mix of spaces and tabs.
-- 
Michael