Thread: ECPG failure on BF member Vaquita (Windows Vista)
I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might be causing this? http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquita&dt=2007-04-24%2020:00:05 The only other Vista buildfarm member (baiji, on the same physical box) is running MSVC builds which don't yet test ECPG from what I can see. Regards, Dave
On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: > I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might > be causing this? Hmm, first glance suggests some permission problems. I never touched a Vista system so far, so I'm at a loss as far as details are concerned. I also saw that wombat is segfaulting in ecpg tests but not only with CVS HEAD but also trying to test 8.2. Any idea what's going on with this machine? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Michael Meskes wrote: > On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: >> I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might >> be causing this? > > Hmm, first glance suggests some permission problems. Yes, that was my thought as well, however I ran cacls down the entire build tree, granting full control to Everyone on everything, but the problem persists. Additionally, all the other tests pass, including make check and make installcheck, so unless the ECPG tests are trying to create something outside of the buildtree, I don't think it is a permissions issue. Is there anything I can try building/running manually to help debug this? > I also saw that wombat is segfaulting in ecpg tests but not only with > CVS HEAD but also trying to test 8.2. Any idea what's going on with this > machine? > > Michael Can't help with that one I'm afraid. Regards, Dave
Dave Page wrote: > Michael Meskes wrote: > >> On Wed, Apr 25, 2007 at 10:47:57AM +0100, Dave Page wrote: >> >>> I'm seeing an ECPG-Check failure on Windows Vista - any ideas what might >>> be causing this? >>> >> Hmm, first glance suggests some permission problems. >> > > Yes, that was my thought as well, however I ran cacls down the entire > build tree, granting full control to Everyone on everything, but the > problem persists. > Please don't do that on your buildfarm repo copy (if that's what you did). You should not touch *anything* inside it. If need to you do this, make a copy (see later) and alter that. If you did do this to the buildfarm repo copy, please blow it away so that buildfarm will get a fresh clean copy next time it runs. > Additionally, all the other tests pass, including make check and make > installcheck, so unless the ECPG tests are trying to create something > outside of the buildtree, I don't think it is a permissions issue. > > Is there anything I can try building/running manually to help debug this? > To recreate the failing buildfarm environment (including making an alterable repo copy), run the following (with suitable local mods): run_build.pl --test --keepall At the end of that you should have the build tree that was actually used, and the install tree too if it got that far. Then you can start playing manually. Remember to move or remove the kept trees before your next buildfarm run. cheers andrew
Andrew Dunstan wrote: > Please don't do that on your buildfarm repo copy (if that's what you > did). You should not touch *anything* inside it. If need to you do this, > make a copy (see later) and alter that. > > If you did do this to the buildfarm repo copy, please blow it away so > that buildfarm will get a fresh clean copy next time it runs. No, I reset the permissions on /msys. Thats entirely necessary on Vista where the permissions are so locked down by default that you can't even create it without some measure of pain. Regards, Dave.
Dave Page wrote: > Andrew Dunstan wrote: > >> Please don't do that on your buildfarm repo copy (if that's what you >> did). You should not touch *anything* inside it. If need to you do this, >> make a copy (see later) and alter that. >> >> If you did do this to the buildfarm repo copy, please blow it away so >> that buildfarm will get a fresh clean copy next time it runs. >> > > No, I reset the permissions on /msys. Thats entirely necessary on Vista > where the permissions are so locked down by default that you can't even > create it without some measure of pain. > > Yes, my Vista experience so far has been very unpleasant indeed. It strikes me as the OS equivalent of jumping the shark. I haven't even thought about Msys ... cheers andrew
On 4/25/07, Michael Meskes <meskes@postgresql.org> wrote: > I also saw that wombat is segfaulting in ecpg tests but not only with > CVS HEAD but also trying to test 8.2. Any idea what's going on with this > machine? I generated a stack trace for REL8_2_STABLE, but I'm not sure how helpful it is. Let me know what other information I can provide... Looks like I don't have symbols for libc. Core was generated by `sql/update '. Program terminated with signal 11, Segmentation fault. #0 ECPGget_variable (ap=<value optimized out>, type=<value optimized out>, var=0x10028730, indicator=1 '\001') at execute.c:131 131 var->ind_value = *((char **) (var->ind_pointer)); (gdb) bt #0 ECPGget_variable (ap=<value optimized out>, type=<value optimized out>, var=0x10028730, indicator=1 '\001') at execute.c:131 #1 0x0000040000048948 in ECPGdo (lineno=28, compat=<value optimized out>, force_indicator=<value optimized out>, connection_name=<valueoptimized out>, query=<value optimized out>) at execute.c:195 #2 0x0000000010000d20 in main (argc=<value optimized out>, argv=<value optimized out>) at update.pgc:28 #3 0x000004000063ce4c in .generic_start_main () from /lib/libc.so.6 #4 0x000004000063d0f8 in .__libc_start_main () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Regards, Mark
Mark Wong wrote: > On 4/25/07, Michael Meskes <meskes@postgresql.org> wrote: >> I also saw that wombat is segfaulting in ecpg tests but not only with >> CVS HEAD but also trying to test 8.2. Any idea what's going on with this >> machine? > > I generated a stack trace for REL8_2_STABLE, but I'm not sure how > helpful it is. Let me know what other information I can provide... > Looks like I don't have symbols for libc. > > Core was generated by `sql/update '. > Program terminated with signal 11, Segmentation fault. > #0 ECPGget_variable (ap=<value optimized out>, type=<value optimized > out>, > var=0x10028730, indicator=1 '\001') at execute.c:131 > 131 var->ind_value = *((char **) > (var->ind_pointer)); > (gdb) bt > #0 ECPGget_variable (ap=<value optimized out>, type=<value optimized > out>, > var=0x10028730, indicator=1 '\001') at execute.c:131 > #1 0x0000040000048948 in ECPGdo (lineno=28, compat=<value optimized > out>, > force_indicator=<value optimized out>, > connection_name=<value optimized out>, query=<value optimized out>) > at execute.c:195 > #2 0x0000000010000d20 in main (argc=<value optimized out>, > argv=<value optimized out>) at update.pgc:28 > #3 0x000004000063ce4c in .generic_start_main () from /lib/libc.so.6 > #4 0x000004000063d0f8 in .__libc_start_main () from /lib/libc.so.6 > #5 0x0000000000000000 in ?? () > > > I think you'll need to compile with optimisation turned off and then try running the test under debugger control, putting a breakpoint in ECPGget_variable() and then stepping through it. I wonder what value of var->ind_pointer it is getting? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I think you'll need to compile with optimisation turned off and then try > running the test under debugger control, putting a breakpoint in > ECPGget_variable() and then stepping through it. I wonder what value of > var->ind_pointer it is getting? You could probably inspect the contents of *var in that dump without having to recompile. Given that this is PPC64, I'm betting on a pointer size or alignment problem ... regards, tom lane
On 4/25/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > I think you'll need to compile with optimisation turned off and then try > > running the test under debugger control, putting a breakpoint in > > ECPGget_variable() and then stepping through it. I wonder what value of > > var->ind_pointer it is getting? > > You could probably inspect the contents of *var in that dump without > having to recompile. Given that this is PPC64, I'm betting on a pointer > size or alignment problem ... Does this help? (gdb) p var->type $1 = 4267828624 (gdb) p var->value $2 = (void *) 0x1 (gdb) p var->pointer $3 = (void *) 0x1 (gdb) p var->varcharsize $4 = 3 (gdb) p var->arrsize $5 = 4 (gdb) p var->offset $6 = 29 (gdb) p var->ind_type $7 = 0 (gdb) p var->ind_pointer $8 = (void *) 0x0 (gdb) p var->ind_varcharsize $9 = 0 (gdb) p var->ind_arrsize $10 = 0 (gdb) p var->ind_offset $11 = 5 (gdb) p var->next $12 = (struct variable *) 0x0 Regards, Mark
"Mark Wong" <markwkm@gmail.com> writes: > Does this help? > (gdb) p var->ind_pointer > $8 = (void *) 0x0 Well, that seems to be the reason why it's failing to indirect through ind_pointer ... but why is it only failing on your machine and not everyone else's? I think this indicates something unportable about ecpg's usage of va_list. Hmm, and I don't have to look far to find a smoking gun: #if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__))if (create_statement(lineno,compat, force_indicator, con, &stmt, query, args) == false) #elseif (create_statement(lineno, compat, force_indicator, con, &stmt, query, &args) == false) #endif Why in the world is that like that? We don't have such a kluge anyplace else we use va_list. stringinfo.c for instance has never needed any such thing. Mark, does your gcc define __powerpc__, or only __powerpc64__? regards, tom lane
On 4/25/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mark Wong" <markwkm@gmail.com> writes: > > Does this help? > > > (gdb) p var->ind_pointer > > $8 = (void *) 0x0 > > Well, that seems to be the reason why it's failing to indirect through > ind_pointer ... but why is it only failing on your machine and not > everyone else's? I think this indicates something unportable about > ecpg's usage of va_list. > > Hmm, and I don't have to look far to find a smoking gun: > > #if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) > if (create_statement(lineno, compat, force_indicator, con, &stmt, query, args) == false) > #else > if (create_statement(lineno, compat, force_indicator, con, &stmt, query, &args) == false) > #endif > > Why in the world is that like that? We don't have such a kluge > anyplace else we use va_list. stringinfo.c for instance has > never needed any such thing. > > Mark, does your gcc define __powerpc__, or only __powerpc64__? $ touch foo.c; gcc -E -dM foo.c | grep __p ; rm foo.c #define __powerpc64__ 1 #define __powerpc__ 1 Regards, Mark
Tom Lane wrote: > > Hmm, and I don't have to look far to find a smoking gun: > > #if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) > if (create_statement(lineno, compat, force_indicator, con, &stmt, query, args) == false) > #else > if (create_statement(lineno, compat, force_indicator, con, &stmt, query, &args) == false) > #endif > > > I also see: #if defined(__GNUC__) && (defined (__powerpc__) || defined(__amd64__) || defined(__x86_64__)) #define APREF ap #else #define APREF *ap #endif But I also see that my amd64/FC6 machine does pass these tests with gcc. I would certainly be nice if we could simplify all this. And if not, we should have a note about why it's needed. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > But I also see that my amd64/FC6 machine does pass these tests with gcc. Yeah, but the typedef represented by va_list can and probably does vary between amd64 and ppc64. I haven't an easy way to check, but I wonder whether it's not an array type on ppc. I'm of the opinion that ecpg is trying to do something here that's not portable. The C99 draft I have says [#3] The type declared is va_list which is an object type suitable for holding information needed by the macros va_start, va_arg, va_end, andva_copy. If access to the varying arguments is desired, the called function shall declare an object (referredto as ap in this subclause) having type va_list. The object ap may be passed as an argument to anotherfunction; if that function invokes the va_arg macro with parameter ap, the value of ap in the callingfunction is indeterminate and shall be passed to the va_end macro prior to any further reference to ap. (198) ____________________ 198 It is permitted to create a pointer to a va_list and pass that pointer to another function, in whichcase the original function may make further use of the original list after the other function returns. The footnote seems to say that what the code is doing is OK ... but there isn't any such footnote in the Single Unix Spec: http://www.opengroup.org/onlinepubs/007908799/xsh/stdarg.h.html which makes me wonder just how portable it really is. My recommendation is to get rid of the APREF hack, deal only in va_list not &va_list, and inline ECPGget_variable into the two places it's used to avoid the question of passing va_lists around after they've been modified. The routine's not that big (especially seeing that only half of it is actually shared by the two callers) and it's just not worth the notational effort, let alone any portability risks, to factor it out. regards, tom lane
On Wed, Apr 25, 2007 at 03:17:19PM -0400, Tom Lane wrote: > Why in the world is that like that? We don't have such a kluge > anyplace else we use va_list. stringinfo.c for instance has > never needed any such thing. I don't remember the exact details but this was added a long time ago before 8.0 because we had some problems with one of the archs. I was suprised about the differences back then too, but haven't checked since. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote: > My recommendation is to get rid of the APREF hack, deal only in > va_list not &va_list, and inline ECPGget_variable into the two > places it's used to avoid the question of passing va_lists around > after they've been modified. The routine's not that big (especially > seeing that only half of it is actually shared by the two callers) > and it's just not worth the notational effort, let alone any portability > risks, to factor it out. Having spend countless hours debugging this stuff I fully agree with you. It's not just ECPGget_variable though. I also had to inline create_statement. This is only called once, so no big deal, but the calling function gets bigger by quite a margin that way. It was definitely easier to read with these functions, but again all this hassle isn't worth it. Attached you'll find a patch that should inline both functions and remove the APREF stuff. This successfully runs the regression suite on my Linux box. Please test it on those archs that needed special treatment before I commit. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
ARGH! This time with patch. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Attachment
Michael Meskes wrote: > > Attached you'll find a patch that should inline both functions and > remove the APREF stuff. This successfully runs the regression suite on > my Linux box. Please test it on those archs that needed special > treatment before I commit. > If you commit to HEAD it will be automatically tested on the buildfarm. cheers andrew
On Thu, Apr 26, 2007 at 06:28:29AM -0500, Andrew Dunstan wrote: > If you commit to HEAD it will be automatically tested on the buildfarm. True. But it might also break a lot of other archs without helping on those troubled ones. I thought this way would be better. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
On 4/26/07, Michael Meskes <meskes@postgresql.org> wrote: > On Wed, Apr 25, 2007 at 04:38:30PM -0400, Tom Lane wrote: > > My recommendation is to get rid of the APREF hack, deal only in > > va_list not &va_list, and inline ECPGget_variable into the two > > places it's used to avoid the question of passing va_lists around > > after they've been modified. The routine's not that big (especially > > seeing that only half of it is actually shared by the two callers) > > and it's just not worth the notational effort, let alone any portability > > risks, to factor it out. > > Having spend countless hours debugging this stuff I fully agree with > you. It's not just ECPGget_variable though. I also had to inline > create_statement. This is only called once, so no big deal, but the > calling function gets bigger by quite a margin that way. It was > definitely easier to read with these functions, but again all this > hassle isn't worth it. > > Attached you'll find a patch that should inline both functions and > remove the APREF stuff. This successfully runs the regression suite on > my Linux box. Please test it on those archs that needed special > treatment before I commit. I applied it to REL8_2_STABLE/pgsql and ran 'run_build.pl --test --keepall REL8_2_STABLE'. Looks like it passed everything for me. :) Regards, Mark
Michael Meskes <meskes@postgresql.org> writes: > Having spend countless hours debugging this stuff I fully agree with > you. It's not just ECPGget_variable though. I also had to inline > create_statement. AFAICS you do not need to inline create_statement. The risk factor is where you call a routine that does something with a va_list, and then you want to do something else (other than va_end) with that va_list after it returns. The one use of create_statement doesn't do that, hence no problem. (We know this works, because stringinfo.c does it.) BTW, I think there is a small bug there: you fail to call va_end() in the failure path. I'm not sure if there are any modern machines where va_end() isn't a no-op, but nonetheless the code isn't meeting the spec. regards, tom lane
On Thu, Apr 26, 2007 at 01:35:42PM -0400, Tom Lane wrote: > AFAICS you do not need to inline create_statement. The risk factor > is where you call a routine that does something with a va_list, and > then you want to do something else (other than va_end) with that va_list > after it returns. The one use of create_statement doesn't do that, > hence no problem. (We know this works, because stringinfo.c does it.) Given all the problems I had so far with this code I'd prefer to keep it inlined. :-) > BTW, I think there is a small bug there: you fail to call va_end() in the > failure path. I'm not sure if there are any modern machines where > va_end() isn't a no-op, but nonetheless the code isn't meeting the spec. Fixed. Thanks. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
On Thu, Apr 26, 2007 at 10:29:47AM -0700, Mark Wong wrote: > I applied it to REL8_2_STABLE/pgsql and ran 'run_build.pl --test > --keepall REL8_2_STABLE'. Looks like it passed everything for me. :) Thanks for this test. I just committed the changes to CVS. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!