Thread: Windows exit code 128 ... it's baaack
I looked at the postmaster log for the ongoing issue on narwhal (to wit, that the contrib/dblink test dies the moment it tries to do anything dblink-y), and looky here what the postmaster has logged: 530fc965.bac:2] LOG: server process (PID 2144) exited with exit code 128 [530fc965.bac:3] DETAIL: Failed process was running: SELECT *FROM dblink('dbname=contrib_regression','SELECT * FROM foo')AS t(a int, b text, c text[])WHERE t.a > 7; [530fc965.bac:4] LOG: server process (PID 2144) exited with exit code 0 [530fc965.bac:5] LOG: terminating any other active server processes The double report of the same process exiting can be attributed to the kluge at postmaster.c lines 2906..2926 (in HEAD): that code logs an ERROR_WAIT_NO_CHILDREN (128) exit and then resets the exitstatus to zero. Further down, where we realize the process failed to disable its deadman switch, we report the phony "exit 0" status and begin the system reset cycle. Now, back in the 2010 thread where we agreed to put in the ignore-128 kluge, it was asserted that all known cases of this exit code were irreproducible Windows flakiness occurring at process start or exit. This is evidently neither start nor exit, but likely is being provoked by trying to load libpq into the backend. And it appears to be 100.00% reproducible on narwhal. Seems worth poking into a little closer. Not by me, though; I don't do Windows. regards, tom lane
On Fri, Feb 28, 2014 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked at the postmaster log for the ongoing issue on narwhal > (to wit, that the contrib/dblink test dies the moment it tries > to do anything dblink-y), and looky here what the postmaster > has logged: > > 530fc965.bac:2] LOG: server process (PID 2144) exited with exit code 128 > [530fc965.bac:3] DETAIL: Failed process was running: SELECT * > FROM dblink('dbname=contrib_regression','SELECT * FROM foo') AS t(a int, b text, c text[]) > WHERE t.a > 7; > [530fc965.bac:4] LOG: server process (PID 2144) exited with exit code 0 > [530fc965.bac:5] LOG: terminating any other active server processes > > > Now, back in the 2010 thread where we agreed to put in the ignore-128 > kluge, it was asserted that all known cases of this exit code were > irreproducible Windows flakiness occurring at process start or exit. > This is evidently neither start nor exit, but likely is being provoked > by trying to load libpq into the backend. Most of the information on net regarding this error code indicates that it is related to some particular windows version and even there are few Hot-Fixes for it, for example: http://support.microsoft.com/kb/974509 Not sure, how relevant such hot-fixes are to current case, as most information indicates that it happens during CreateProcess(), but the current failure doesn't seem to have relation with CreateProcess(). I have tried to reproduce it on my local Windows m/c (Win-7), but couldn't able to reproduce it. I think looking into Event Viewer logs of that time might give some clue. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2014-02-27 19:14:13 -0500, Tom Lane wrote: > I looked at the postmaster log for the ongoing issue on narwhal > (to wit, that the contrib/dblink test dies the moment it tries > to do anything dblink-y), and looky here what the postmaster > has logged: One interesting bit about this is that it seems to work in 9.3... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-02-27 19:14:13 -0500, Tom Lane wrote: >> I looked at the postmaster log for the ongoing issue on narwhal >> (to wit, that the contrib/dblink test dies the moment it tries >> to do anything dblink-y), and looky here what the postmaster >> has logged: > One interesting bit about this is that it seems to work in 9.3... Well, yeah, it seems to have been broken somehow by the Windows linking changes we did awhile back to try to ensure that missing PGDLLIMPORT markers would be detected reliably. Which we did not back-patch. regards, tom lane
On 2014-04-05 11:05:09 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-02-27 19:14:13 -0500, Tom Lane wrote: > >> I looked at the postmaster log for the ongoing issue on narwhal > >> (to wit, that the contrib/dblink test dies the moment it tries > >> to do anything dblink-y), and looky here what the postmaster > >> has logged: > > > One interesting bit about this is that it seems to work in 9.3... > > Well, yeah, it seems to have been broken somehow by the Windows > linking changes we did awhile back to try to ensure that missing > PGDLLIMPORT markers would be detected reliably. Which we did not > back-patch. Hard to say since there's been no working builds for HEAD for so long on narwahl :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/07/2014 10:26 AM, Andres Freund wrote: > On 2014-04-05 11:05:09 -0400, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> On 2014-02-27 19:14:13 -0500, Tom Lane wrote: >>>> I looked at the postmaster log for the ongoing issue on narwhal >>>> (to wit, that the contrib/dblink test dies the moment it tries >>>> to do anything dblink-y), and looky here what the postmaster >>>> has logged: >>> One interesting bit about this is that it seems to work in 9.3... >> Well, yeah, it seems to have been broken somehow by the Windows >> linking changes we did awhile back to try to ensure that missing >> PGDLLIMPORT markers would be detected reliably. Which we did not >> back-patch. > Hard to say since there's been no working builds for HEAD for so long on > narwahl :(. > This issue has been hanging around for many months, possibly much longer since the last successful build on narwhal was 2012-08-01 and then it went quiet until 2014-02-03, when it came back with this error. If we don't care to find a fix, I think we need to declare narwhal's fairly ancient compiler out of support and decommission it. Other gcc systems we have with more modern compilers are not getting this issue. cheers andrew