Thread: CVS HEAD busted on Windows?
I notice buildfarm member snake is unhappy: The program "postgres" is needed by initdb but was not found in the same directory as "C:/msys/1.0/local/build-farm/HEAD/pgsql.696/src/test/regress/tmp_check/install/usr/local/build-farm/HEAD/inst/bin/initdb.exe". Check your installation. I'm betting Peter's recent patch to merge postmaster and postgres missed some little thing or other ... regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane > Sent: 20 June 2006 00:02 > To: Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: [HACKERS] CVS HEAD busted on Windows? > > I notice buildfarm member snake is unhappy: > > The program "postgres" is needed by initdb but was not found in the > same directory as > "C:/msys/1.0/local/build-farm/HEAD/pgsql.696/src/test/regress/ > tmp_check/install/usr/local/build-farm/HEAD/inst/bin/initdb.exe". > Check your installation. > > I'm betting Peter's recent patch to merge postmaster and > postgres missed > some little thing or other ... Yeah, I had guessed at that, but as the breakage coincided with a 'make check' hang and a buildfarm script upgrade I was waiting until after tonights run (following a cleanup yesterday) before reporting. What I'm seeing this morning is another hang in 'make check'. After killing it off, the status can be seen at: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=snake&dt=2006-06-20%20 01:00:01 As I'm currently running all three builds from one task, I'll go separate them out now... Regards, Dave.
Dave Page wrote: > > > > >>-----Original Message----- >>From: pgsql-hackers-owner@postgresql.org >>[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane >>Sent: 20 June 2006 00:02 >>To: Peter Eisentraut; pgsql-hackers@postgresql.org >>Subject: [HACKERS] CVS HEAD busted on Windows? >> >>I notice buildfarm member snake is unhappy: >> >>The program "postgres" is needed by initdb but was not found in the >>same directory as >>"C:/msys/1.0/local/build-farm/HEAD/pgsql.696/src/test/regress/ >>tmp_check/install/usr/local/build-farm/HEAD/inst/bin/initdb.exe". >>Check your installation. >> >>I'm betting Peter's recent patch to merge postmaster and >>postgres missed >>some little thing or other ... >> >> > >Yeah, I had guessed at that, but as the breakage coincided with a 'make >check' hang and a buildfarm script upgrade I was waiting until after >tonights run (following a cleanup yesterday) before reporting. > >What I'm seeing this morning is another hang in 'make check'. After >killing it off, the status can be seen at: > >http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=snake&dt=2006-06-20%20 >01:00:01 > > > > 1. it's nothing to do with upgrading - I saw this also and on a machine that hasn't upgraded the buildfarm script. 2. It looks like you got past initdb, so Tom's theory about that is looking unlikely. cheers andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: 20 June 2006 12:12 > To: Dave Page > Cc: Tom Lane; Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > > 1. it's nothing to do with upgrading - I saw this also and on > a machine > that hasn't upgraded the buildfarm script. Nope, didn't think it did - just happened at the same time. > 2. It looks like you got past initdb, so Tom's theory about that is > looking unlikely. Yeah - I've been seeing both issues though - Tom's one where it complains that postgres.exe doesn't exist then bombs out of initdb (it does exist btw), and the one above where make check hangs and I end up killing it off. Regards, Dave
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Dave Page > Sent: 20 June 2006 09:10 > To: Tom Lane; Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > As I'm currently running all three builds from one task, I'll go > separate them out now... Well that helped - 8.0 and 8.1 built OK so I'm fairly sure nothing is screwed up on the box itself. HEAD hung in make check again though - I killed it this morning (after 6+ hours of runtime) and it reported: ============== creating database "regression" ============== CREATE DATABASE pg_regress: could not set database default locales make: *** [check] Error 2 The full output is at http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=snake&dt=2006-06-21%20 01:00:01 Regards, Dave.
"Dave Page" <dpage@vale-housing.co.uk> writes: > killed it this morning (after 6+ hours of runtime) and it reported: > pg_regress: could not set database default locales That would be here: "$bindir/psql" -q -X $psql_options -c "\ alter database \"$dbname\" set lc_messages to 'C'; alter database \"$dbname\" set lc_monetary to 'C'; alter database \"$dbname\" set lc_numeric to 'C'; alter database \"$dbname\" set lc_time to 'C';" "$dbname" if [ $? -ne 0 ]; then echo "$me: could not set database default locales" (exit 2); exit fi Could you gdb the attached backend and see what it's doing? Is it actually consuming CPU, or just stuck? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 21 June 2006 14:30 > To: Dave Page > Cc: Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > "Dave Page" <dpage@vale-housing.co.uk> writes: > > killed it this morning (after 6+ hours of runtime) and it reported: > > pg_regress: could not set database default locales > > That would be here: > > "$bindir/psql" -q -X $psql_options -c "\ > alter database \"$dbname\" set lc_messages to 'C'; > alter database \"$dbname\" set lc_monetary to 'C'; > alter database \"$dbname\" set lc_numeric to 'C'; > alter database \"$dbname\" set lc_time to 'C';" "$dbname" > if [ $? -ne 0 ]; then > echo "$me: could not set database default locales" > (exit 2); exit > fi > > Could you gdb the attached backend and see what it's doing? Is it > actually consuming CPU, or just stuck? Hmmm, running it more interactively I see what was probably the hang when run from the scheduler: --------------------------- psql.exe - Application Error --------------------------- The instruction at "0x7c8396d0" referenced memory at "0x00000014". The memory could not be "read". Click on OK to terminate the program Click on CANCEL to debug the program --------------------------- OK Cancel --------------------------- (if that got fired off under the scheduler it would appear to hang as there would be no display to display it on, and no way to hit OK or Cancel). Unfortunately gdb is giving a somewhat useless backtrace (yes, this is a debug build): (gdb) bt #0 0x7c822584 in ntdll!DbgUiIssueRemoteBreakin () from C:\WINDOWS\system32\ntdll.dll #1 0x7c845ea0 in ntdll!EtwGetTraceEnableFlags () from C:\WINDOWS\system32\ntdll.dll #2 0x00000005 in ?? () #3 0x00000004 in ?? () #4 0x00000001 in ?? () #5 0x00bfffd0 in ?? () #6 0x00000003 in ?? () #7 0xffffffff in ?? () #8 0x7c82f680 in ntdll!RtlGetNativeSystemInformation () from C:\WINDOWS\system32\ntdll.dll #9 0x7c845eb0 in ntdll!EtwGetTraceEnableFlags () from C:\WINDOWS\system32\ntdll.dll #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in ?? () from #13 0x00000000 in ?? () from #14 0x00c00040 in ?? () #15 0x00000000 in ?? () from #16 0x00000000 in ?? () from #17 0x00000000 in ?? () from #18 0x00000000 in ?? () from <snip another 1000 or so lines> Visual Studio can't do much better, but that's to be expected because it can't understand the mingw debug symbols. :-( Regards, Dave.
"Dave Page" <dpage@vale-housing.co.uk> writes: > Hmmm, running it more interactively I see what was probably the hang > when run from the scheduler: > --------------------------- > psql.exe - Application Error > --------------------------- > The instruction at "0x7c8396d0" referenced memory at "0x00000014". The > memory could not be "read". Well, that's pretty clearly a null pointer dereference. Any chance of mapping the instruction address back to something useful? This being a psql-side crash, it couldn't be related to the symptom we saw before of initdb not finding postgres.exe. I'm starting to wonder if maybe that machine has suddenly developed an intermittent memory fault or some such. Have you noticed any other flaky behavior? regards, tom lane
Tom Lane wrote: > > This being a psql-side crash, it couldn't be related to the symptom > we saw before of initdb not finding postgres.exe. I'm starting to > wonder if maybe that machine has suddenly developed an intermittent > memory fault or some such. Have you noticed any other flaky behavior? > > I have seen similar behaviour on my Windows box. cheers andrew
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 21 June 2006 16:08 > To: Dave Page > Cc: Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > "Dave Page" <dpage@vale-housing.co.uk> writes: > > Hmmm, running it more interactively I see what was probably the hang > > when run from the scheduler: > > > --------------------------- > > psql.exe - Application Error > > --------------------------- > > The instruction at "0x7c8396d0" referenced memory at > "0x00000014". The > > memory could not be "read". > > Well, that's pretty clearly a null pointer dereference. Any chance of > mapping the instruction address back to something useful? Not sure how I would do that. Any hints? > This being a psql-side crash, it couldn't be related to the symptom > we saw before of initdb not finding postgres.exe. I'm starting to > wonder if maybe that machine has suddenly developed an intermittent > memory fault or some such. Have you noticed any other flaky behavior? Nope. 8.0 and 8.1 build fine now they're run from separate scheduler jobs. The management card hasn't reported any errors, and it's a server grade box with ECC RAM, RAID 10 disks etc. Plus Andrew's seeing a similar issue of course... Regards, Dave.
"Dave Page" <dpage@vale-housing.co.uk> writes: >> Well, that's pretty clearly a null pointer dereference. Any chance of >> mapping the instruction address back to something useful? > Not sure how I would do that. Any hints? gdb might produce something useful, with something like x/32i 0xNNNN - 16 Failing that, see if you can get the linker to spit out a link map when it builds psql. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 21 June 2006 16:32 > To: Dave Page > Cc: Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > "Dave Page" <dpage@vale-housing.co.uk> writes: > >> Well, that's pretty clearly a null pointer dereference. > Any chance of > >> mapping the instruction address back to something useful? > > > Not sure how I would do that. Any hints? > > gdb might produce something useful, with something like > > x/32i 0xNNNN - 16 Not useful - it's all in ntdll.dll. > Failing that, see if you can get the linker to spit out a > link map when > it builds psql. Attached. With a little poking in the VS debugger, it would appear that the address shown is in fact in ntdll.dll. It shows the following call stack: > ntdll.dll!7c8396d0() [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] psql.exe!0040a269() ntdll.dll!7c839620() psql.exe!00404532() psql.exe!00404983() ntdll.dll!7c82fb23() psql.exe!0040a269() psql.exe!0040a32b() psql.exe!004099d2() ntdll.dll!7c82f9dd() msvcrt.dll!77bbcef6() msvcrt.dll!77bcc077() psql.exe!004011e7() psql.exe!00401238() kernel32.dll!77e523cd() If I'm reading it (and understanding) correctly, that suggests the caller might be GetVariable(). Regards, Dave.
Attachment
Hi, From: "Dave Page" <dpage@vale-housing.co.uk> Subject: Re: [HACKERS] CVS HEAD busted on Windows? Date: Wed, 21 Jun 2006 15:39:34 +0100 > > > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: 21 June 2006 14:30 > > To: Dave Page > > Cc: Peter Eisentraut; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > > > "Dave Page" <dpage@vale-housing.co.uk> writes: > > > killed it this morning (after 6+ hours of runtime) and it reported: > > > pg_regress: could not set database default locales > > > > That would be here: > > > > "$bindir/psql" -q -X $psql_options -c "\ > > alter database \"$dbname\" set lc_messages to 'C'; > > alter database \"$dbname\" set lc_monetary to 'C'; > > alter database \"$dbname\" set lc_numeric to 'C'; > > alter database \"$dbname\" set lc_time to 'C';" "$dbname" > > if [ $? -ne 0 ]; then > > echo "$me: could not set database default locales" > > (exit 2); exit > > fi > > > > Could you gdb the attached backend and see what it's doing? Is it > > actually consuming CPU, or just stuck? > > Hmmm, running it more interactively I see what was probably the hang > when run from the scheduler: > > --------------------------- > psql.exe - Application Error > --------------------------- > The instruction at "0x7c8396d0" referenced memory at "0x00000014". The > memory could not be "read". > Click on OK to terminate the program > Click on CANCEL to debug the program > --------------------------- > OK Cancel > --------------------------- > > (if that got fired off under the scheduler it would appear to hang as > there would be no display to display it on, and no way to hit OK or > Cancel). > > Unfortunately gdb is giving a somewhat useless backtrace (yes, this is a > debug build): I got a backtrace. Program received signal SIGSEGV, Segmentation fault. 0x7c958fea in _libwsock32_a_iname () (gdb) bt #0 0x7c958fea in _libwsock32_a_iname () #1 0x7c951538 in _libwsock32_a_iname () #2 0x7c94104b in _libwsock32_a_iname () #3 0x00404e68 in SendQuery (query=0x3d2669 "select 1") at common.c:815 #4 0x0040951a in main (argc=9, argv=0x3d27c8) at startup.c:306 cancelConnLock is not initialized (InitializeCriticalSection()) at startup.c when psql execute with '-c' option. Is the following patch right? Index: startup.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/bin/psql/startup.c,v retrieving revision 1.133 diff -c -r1.133 startup.c *** startup.c 14 Jun 2006 16:49:02 -0000 1.133 --- startup.c 21 Jun 2006 17:36:26 -0000 *************** *** 303,308 **** --- 303,313 ---- if (VariableEquals(pset.vars, "ECHO", "all")) puts(options.action_string); + #ifdef WIN32 + /* establish control-C handling for interactive operation */ + setup_cancel_handler(); + #endif + successResult = SendQuery(options.action_string) ? EXIT_SUCCESS : EXIT_FAILURE; } -- Yoshiyuki Asaba y-asaba@sraoss.co.jp
Yoshiyuki Asaba <y-asaba@sraoss.co.jp> writes: > cancelConnLock is not initialized (InitializeCriticalSection()) at > startup.c when psql execute with '-c' option. Doh! My fault. > Is the following patch right? I think we should just move the setup_cancel_handler call up earlier in startup.c. regards, tom lane
"Dave Page" <dpage@vale-housing.co.uk> writes: > Yeah - I've been seeing both issues though - Tom's one where it > complains that postgres.exe doesn't exist then bombs out of initdb (it > does exist btw), and the one above where make check hangs and I end up > killing it off. OK, now that Yoshiyuki-san found my silly error about not initializing cancelConnLock in the right place, we can get back to the postgres.exe- doesn't-exist problem. Anyone have a clue about that? Has anyone seen it anywhere besides snake? regards, tom lane
Tom Lane said: > "Dave Page" <dpage@vale-housing.co.uk> writes: >> Yeah - I've been seeing both issues though - Tom's one where it >> complains that postgres.exe doesn't exist then bombs out of initdb (it >> does exist btw), and the one above where make check hangs and I end up >> killing it off. > > OK, now that Yoshiyuki-san found my silly error about not initializing > cancelConnLock in the right place, we can get back to the postgres.exe- > doesn't-exist problem. Anyone have a clue about that? Has anyone seen > it anywhere besides snake? loris has just gone green again as a result of the fix, so I don't see any problems. cheers andrew
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 21 June 2006 20:43 > To: Dave Page > Cc: Andrew Dunstan; Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > "Dave Page" <dpage@vale-housing.co.uk> writes: > > Yeah - I've been seeing both issues though - Tom's one where it > > complains that postgres.exe doesn't exist then bombs out of > initdb (it > > does exist btw), and the one above where make check hangs > and I end up > > killing it off. > > OK, now that Yoshiyuki-san found my silly error about not initializing > cancelConnLock in the right place, we can get back to the > postgres.exe- > doesn't-exist problem. Anyone have a clue about that? Has > anyone seen > it anywhere besides snake? I'm beginning to wonder if that's a red herring from where I was killing things off and re-running in a possibly un-clean tree. It seems to be OK tonight anyway, so let's just keep an eye on it for now. As a sidenote on the postgres/postmaster merge subject though - Magnus & I were wondering if Peter's change means we no longer need to ship postmaster.exe and postgres.exe with pgInstaller. Presumably we can just use postgres.exe for everything now? Regards, Dave.
Dave Page wrote: > >As a sidenote on the postgres/postmaster merge subject though - Magnus & >I were wondering if Peter's change means we no longer need to ship >postmaster.exe and postgres.exe with pgInstaller. Presumably we can just >use postgres.exe for everything now? > > > > Won't we still need to know if we are called as postmaster or postgres? cheers andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: 22 June 2006 14:06 > To: Dave Page > Cc: Tom Lane; Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CVS HEAD busted on Windows? > > > > Dave Page wrote: > > > > >As a sidenote on the postgres/postmaster merge subject > though - Magnus & > >I were wondering if Peter's change means we no longer need to ship > >postmaster.exe and postgres.exe with pgInstaller. Presumably > we can just > >use postgres.exe for everything now? > > > > > > > > > > Won't we still need to know if we are called as postmaster or > postgres? Unless the 'postmaster' instance starts all it's sub processes with an additional option to tell them they're children (I haven't looked at the code yet so I dunno if this is how it's done). For those that are unaware, because Windows doesn't support symlinks, we currently ship two copies of the binary. We could save 3.2MB (uncompressed, 8.1.4) if we could lose one of them. Regards, Dave.
Dave Page wrote: >>Won't we still need to know if we are called as postmaster or >>postgres? >> >> > >Unless the 'postmaster' instance starts all it's sub processes with an >additional option to tell them they're children (I haven't looked at the >code yet so I dunno if this is how it's done). > >For those that are unaware, because Windows doesn't support symlinks, we >currently ship two copies of the binary. We could save 3.2MB >(uncompressed, 8.1.4) if we could lose one of them. > > > > Windows children could be handled, I think, but here is also standalone postgres. 3.2 Mb is not insignificant, but I think we can live with it. cheers andrew
> Dave Page wrote: > > Won't we still need to know if we are called as postmaster or > > postgres? > > Unless the 'postmaster' instance starts all it's sub processes with an > additional option to tell them they're children (I haven't looked at the > code yet so I dunno if this is how it's done). > > For those that are unaware, because Windows doesn't support symlinks, we > currently ship two copies of the binary. We could save 3.2MB > (uncompressed, 8.1.4) if we could lose one of them. I look at that structure was successful by huge backend.dll at 8.2. In spite of not arranging it yet, it looks great. However, Several K-Bytes are still used vainly. But, I am not investigating which the is still good. Regards, Hiroshi Saito
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: 22 June 2006 14:26 > To: Dave Page > Cc: Tom Lane; Peter Eisentraut; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] postmaster.exe vs postgres.exe > > > Windows children could be handled, I think, but here is also > standalone > postgres. True. > 3.2 Mb is not insignificant, but I think we can live with it. That's about 1.4Mb compressed BTW. We can live with it, but it is still a *lot* of bandwidth when you start talking about hundreds of thousands of downloads. Regards, Dave.
"Dave Page" <dpage@vale-housing.co.uk> writes: >>> though - Magnus & >>> I were wondering if Peter's change means we no longer need to ship >>> postmaster.exe and postgres.exe with pgInstaller. Presumably >>> we can just use postgres.exe for everything now? >> Won't we still need to know if we are called as postmaster or >> postgres? No. The entire point of the recent changes is that the behavior no longer depends on the name of the executable, only on the switches. In the Unix distributions, the only reason to keep the postmaster symlink is to avoid breaking old start scripts that invoke "postmaster". We may be able to drop the symlink eventually, though I see no reason to be in a hurry about it. In the Windows case, I think you'd have to ask if there are any start- script-equivalents outside your control that you're worried about breaking. Given the distribution-size penalty you face by having two copies, obviously you're more motivated to drop the extra .exe sooner than we'll probably do in the Unix distros. regards, tom lane
Tom Lane wrote: >>>Won't we still need to know if we are called as postmaster or >>>postgres? >>> >>> > >No. The entire point of the recent changes is that the behavior no >longer depends on the name of the executable, only on the switches. > > > > Oh. My mistake. That sounds good. cheers andrew
Tom Lane wrote: > "Dave Page" <dpage@vale-housing.co.uk> writes: > >>> though - Magnus & > >>> I were wondering if Peter's change means we no longer need to ship > >>> postmaster.exe and postgres.exe with pgInstaller. Presumably > >>> we can just use postgres.exe for everything now? > > >> Won't we still need to know if we are called as postmaster or > >> postgres? > > No. The entire point of the recent changes is that the behavior no > longer depends on the name of the executable, only on the switches. > > In the Unix distributions, the only reason to keep the postmaster > symlink is to avoid breaking old start scripts that invoke "postmaster". > We may be able to drop the symlink eventually, though I see no reason > to be in a hurry about it. > > In the Windows case, I think you'd have to ask if there are any start- > script-equivalents outside your control that you're worried about > breaking. Given the distribution-size penalty you face by having two > copies, obviously you're more motivated to drop the extra .exe sooner > than we'll probably do in the Unix distros. Can't the installer just copy postgres.exe to postmaster.exe during install? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> -----Original Message----- > From: Bruce Momjian [mailto:bruce@momjian.us] > Sent: 23 June 2006 07:09 > To: Tom Lane > Cc: Dave Page; Andrew Dunstan; Peter Eisentraut; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] postmaster.exe vs postgres.exe (was: > CVS HEAD busted on Windows?) > > Can't the installer just copy postgres.exe to postmaster.exe during > install? That's not something that Windows Installer does - we'd have to write some code to do it at the end of the installation, then call it as a custom action. Actually it'd probably be fairly trivial, but I'm having a hard time imagining why anyone would be relying on the existence of postmaster.exe anyway, unless they were packaging their own release in which case it's their problem anyhoo. Regards, Dave.