Thread: Windows exit code 128 ... it's baaack

Windows exit code 128 ... it's baaack

From
Tom Lane
Date:
I looked at the postmaster log for the ongoing issue on narwhal
(to wit, that the contrib/dblink test dies the moment it tries
to do anything dblink-y), and looky here what the postmaster
has logged:

530fc965.bac:2] LOG:  server process (PID 2144) exited with exit code 128
[530fc965.bac:3] DETAIL:  Failed process was running: SELECT *FROM dblink('dbname=contrib_regression','SELECT * FROM
foo')AS t(a int, b text, c text[])WHERE t.a > 7;
 
[530fc965.bac:4] LOG:  server process (PID 2144) exited with exit code 0
[530fc965.bac:5] LOG:  terminating any other active server processes

The double report of the same process exiting can be attributed to
the kluge at postmaster.c lines 2906..2926 (in HEAD): that code
logs an ERROR_WAIT_NO_CHILDREN (128) exit and then resets the exitstatus
to zero.  Further down, where we realize the process failed to disable
its deadman switch, we report the phony "exit 0" status and begin the
system reset cycle.

Now, back in the 2010 thread where we agreed to put in the ignore-128
kluge, it was asserted that all known cases of this exit code were
irreproducible Windows flakiness occurring at process start or exit.
This is evidently neither start nor exit, but likely is being provoked
by trying to load libpq into the backend.  And it appears to be 100.00%
reproducible on narwhal.

Seems worth poking into a little closer.  Not by me, though; I don't
do Windows.
        regards, tom lane



Re: Windows exit code 128 ... it's baaack

From
Amit Kapila
Date:
On Fri, Feb 28, 2014 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked at the postmaster log for the ongoing issue on narwhal
> (to wit, that the contrib/dblink test dies the moment it tries
> to do anything dblink-y), and looky here what the postmaster
> has logged:
>
> 530fc965.bac:2] LOG:  server process (PID 2144) exited with exit code 128
> [530fc965.bac:3] DETAIL:  Failed process was running: SELECT *
>         FROM dblink('dbname=contrib_regression','SELECT * FROM foo') AS t(a int, b text, c text[])
>         WHERE t.a > 7;
> [530fc965.bac:4] LOG:  server process (PID 2144) exited with exit code 0
> [530fc965.bac:5] LOG:  terminating any other active server processes
>
>
> Now, back in the 2010 thread where we agreed to put in the ignore-128
> kluge, it was asserted that all known cases of this exit code were
> irreproducible Windows flakiness occurring at process start or exit.
> This is evidently neither start nor exit, but likely is being provoked
> by trying to load libpq into the backend.

Most of the information on net regarding this error code indicates
that it is related to some particular windows version and even there
are few Hot-Fixes for it, for example:
http://support.microsoft.com/kb/974509

Not sure, how relevant such hot-fixes are to current case, as most
information indicates that it happens during CreateProcess(), but the
current failure doesn't seem to have relation with CreateProcess().

I have tried to reproduce it on my local Windows m/c (Win-7), but
couldn't able to reproduce it. I think looking into Event Viewer logs
of that time might give some clue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Windows exit code 128 ... it's baaack

From
Andres Freund
Date:
On 2014-02-27 19:14:13 -0500, Tom Lane wrote:
> I looked at the postmaster log for the ongoing issue on narwhal
> (to wit, that the contrib/dblink test dies the moment it tries
> to do anything dblink-y), and looky here what the postmaster
> has logged:

One interesting bit about this is that it seems to work in 9.3...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Windows exit code 128 ... it's baaack

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-02-27 19:14:13 -0500, Tom Lane wrote:
>> I looked at the postmaster log for the ongoing issue on narwhal
>> (to wit, that the contrib/dblink test dies the moment it tries
>> to do anything dblink-y), and looky here what the postmaster
>> has logged:

> One interesting bit about this is that it seems to work in 9.3...

Well, yeah, it seems to have been broken somehow by the Windows
linking changes we did awhile back to try to ensure that missing
PGDLLIMPORT markers would be detected reliably.  Which we did not
back-patch.
        regards, tom lane



Re: Windows exit code 128 ... it's baaack

From
Andres Freund
Date:
On 2014-04-05 11:05:09 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-02-27 19:14:13 -0500, Tom Lane wrote:
> >> I looked at the postmaster log for the ongoing issue on narwhal
> >> (to wit, that the contrib/dblink test dies the moment it tries
> >> to do anything dblink-y), and looky here what the postmaster
> >> has logged:
> 
> > One interesting bit about this is that it seems to work in 9.3...
> 
> Well, yeah, it seems to have been broken somehow by the Windows
> linking changes we did awhile back to try to ensure that missing
> PGDLLIMPORT markers would be detected reliably.  Which we did not
> back-patch.

Hard to say since there's been no working builds for HEAD for so long on
narwahl :(.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Windows exit code 128 ... it's baaack

From
Andrew Dunstan
Date:
On 04/07/2014 10:26 AM, Andres Freund wrote:
> On 2014-04-05 11:05:09 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> On 2014-02-27 19:14:13 -0500, Tom Lane wrote:
>>>> I looked at the postmaster log for the ongoing issue on narwhal
>>>> (to wit, that the contrib/dblink test dies the moment it tries
>>>> to do anything dblink-y), and looky here what the postmaster
>>>> has logged:
>>> One interesting bit about this is that it seems to work in 9.3...
>> Well, yeah, it seems to have been broken somehow by the Windows
>> linking changes we did awhile back to try to ensure that missing
>> PGDLLIMPORT markers would be detected reliably.  Which we did not
>> back-patch.
> Hard to say since there's been no working builds for HEAD for so long on
> narwahl :(.
>


This issue has been hanging around for many months, possibly much longer 
since the last successful build on narwhal was 2012-08-01 and then it 
went quiet until 2014-02-03, when it came back with this error.

If we don't care to find a fix, I think we need to declare narwhal's 
fairly ancient compiler out of support and decommission it. Other gcc 
systems we have with more modern compilers are not getting this issue.

cheers

andrew