Thread: gothic_moth, codlin_moth failures on REL8_2_STABLE
Since the buildfarm is mostly green these days, I took some time to look into the few remaining consistent failures. One is that gothic_moth and codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a regression diff like this: *** 2453,2459 **** <body> <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i> <a href="http://www.google.com/foo.bar.html"target="_blank">YES </a> ! ff-bg <script> document.write(15); </script> --- 2453,2459 ---- <body> <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i> <a href="http://www.google.com/foo.bar.html"target="_blank">YES </a> ! ff-bgff-bg <script> document.write(15); </script> These animals are not testing any branches older than 8.2. The same test appears in newer branches and passes, but the code involved got migrated to core and probably changed around a bit. I traced through this test on my own machine and determined that the way it's supposed to work is like this: the tsearch parser breaks the string into a series of tokens that include these: ff-bg compound wordff compound word element- punctuationbg compound word element The highlight function is then supposed to set skip = 1 on the compound word, causing it to be skipped when genhl() reconstructs the text. The failure looks to me like the compound word is not getting skipped. Both the setting and the testing of the flag seem to be absolutely straightforward portable code; although the "skip" struct field is a bitfield, which is a C feature we don't use very heavily. My conclusion is that this is probably a compiler bug. Both buildfarm animals appear to be using Sun Studio, although on different architectures which weakens the compiler-bug theory a bit. Even though we are not seeing the failure in later PG branches, it would probably be worth looking into more closely, because if it's biting 8.2 it could start biting newer code as well. But without access to a machine showing the problem it is difficult to do much. regards, tom lane
On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My conclusion is that this is probably a compiler bug. Both buildfarm > animals appear to be using Sun Studio, although on different > architectures which weakens the compiler-bug theory a bit. Even though > we are not seeing the failure in later PG branches, it would probably be > worth looking into more closely, because if it's biting 8.2 it could > start biting newer code as well. But without access to a machine > showing the problem it is difficult to do much. Could be this: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087 It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this patch missing on both gothic-moth and codlin-moth? I suppose it's possible to have a configure test to check for whether this patch is present but I'm not sure how much it's worthwhile given that it'll only help people who happen to recompile their 8.2 server after the next Postgres patch. And I'm not sure we can check for patches without assuming the CC is the OS-shipped cc. Does cc itself have an option to list which patches it has applied to it? -- greg
Incidentally Zdenek came to the same conclusion that it was a compiler bug in <4AA775A9.80702@sun.com> -- greg
Greg Stark <gsstark@mit.edu> writes: > Incidentally Zdenek came to the same conclusion that it was a compiler > bug in <4AA775A9.80702@sun.com> Drat, I had forgotten that exchange. I reconstructed Teodor's advice the hard way :-( regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My conclusion is that this is probably a compiler bug. > Could be this: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087 Hmmm ... that doesn't seem to be quite an exact match, because the setting and testing of the bitfield is in different functions in different files in our case. Still, it seems related. It would be useful to verify whether these two buildfarm animals are fully up-to-date on compiler patches. regards, tom lane
Hi Tom, I'm sorry that I did not look on it early. I played with it and there are some facts. gothic(sparc) and codlin(x86) uses Sun Studio 12 nad I setup them to use very high optimization. Gothic: ------- -xalias_level=basic -xarch=native -xdepend -xmemalign=8s -xO5 -xprefetch=auto,explicit Codlin: ------- -xalias_level=basic -xarch=native -xdepend -xO4 -xprefetch=auto,explicit -xO5 is highest optimization, -xO4 is little bit worse A play with flags and found that "-xO4 -xalias_level=basic" generates problem. "-xO3 -xalias_level=basic" works fine "-xO5" works fine As documentation say: Cite from Sun studio compiler guide: http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view ------------------------------------------------------------------------ xalias_level=basic ------------------ If you use the -xalias_level=basic option, the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler also assumes that references to all other types can alias each other as well as any C basic type. The compiler assumes that references using char * can alias any other type. For example, at the -xalias_level=basic level, the compiler assumes that a pointer variable of type int * is not going to access a float object. Therefore it is safe for the compiler to perform optimizations that assume a pointer of type float * will not alias the same memory that is referenced with a pointer of type int *. -x04 ----- Preforms automatic inlining of functions contained in the same file in addition to performing -xO3 optimizations. This automatic inlining usually improves execution speed, but sometimes makes it worse. In general, this level results in increased code size. ------------------------------------------------------------------------ I redefined bitfields to char in HLWORD and it works. Your guess is correct. But question still where is the place when bitfields works bad. Any idea where I should look? IIRC, I had this problem also on head, when I tried to fix tsearch regression test for Czech locale. This problem appears and disappears. Zdenek Dne 11.03.10 00:37, Tom Lane napsal(a): > Since the buildfarm is mostly green these days, I took some time to look > into the few remaining consistent failures. One is that gothic_moth and > codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a > regression diff like this: > > *** 2453,2459 **** > <body> > <b>Sea</b> view wow<u><b>foo</b> bar</u> <i>qq</i> > <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a> > ! ff-bg > <script> > document.write(15); > </script> > --- 2453,2459 ---- > <body> > <b>Sea</b> view wow<u><b>foo</b> bar</u> <i>qq</i> > <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a> > ! ff-bgff-bg > <script> > document.write(15); > </script> > > These animals are not testing any branches older than 8.2. The same > test appears in newer branches and passes, but the code involved got > migrated to core and probably changed around a bit. > > I traced through this test on my own machine and determined that the > way it's supposed to work is like this: the tsearch parser breaks the > string into a series of tokens that include these: > > ff-bg compound word > ff compound word element > - punctuation > bg compound word element > > The highlight function is then supposed to set skip = 1 on the compound > word, causing it to be skipped when genhl() reconstructs the text. > The failure looks to me like the compound word is not getting skipped. > Both the setting and the testing of the flag seem to be absolutely > straightforward portable code; although the "skip" struct field is a > bitfield, which is a C feature we don't use very heavily. > > My conclusion is that this is probably a compiler bug. Both buildfarm > animals appear to be using Sun Studio, although on different > architectures which weakens the compiler-bug theory a bit. Even though > we are not seeing the failure in later PG branches, it would probably be > worth looking into more closely, because if it's biting 8.2 it could > start biting newer code as well. But without access to a machine > showing the problem it is difficult to do much. > > regards, tom lane
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > "-xO4 -xalias_level=basic" generates problem. > "-xO3 -xalias_level=basic" works fine > "-xO5" works fine > As documentation say: > Cite from Sun studio compiler guide: > http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view > xalias_level=basic > ------------------ > If you use the -xalias_level=basic option, the compiler assumes that > memory references that involve different C basic types do not alias each > other. The compiler also assumes that references to all other types can > alias each other as well as any C basic type. The compiler assumes that > references using char * can alias any other type. > For example, at the -xalias_level=basic level, the compiler assumes that > a pointer variable of type int * is not going to access a float object. > Therefore it is safe for the compiler to perform optimizations that > assume a pointer of type float * will not alias the same memory that is > referenced with a pointer of type int *. I think you need to turn that off. On gcc we use -fno-strict-aliasing which disables the type of compiler assumption that this is talking about. I'm not sure exactly how that might create the specific failure we are seeing here, but I can point you to lots and lots of places in the sources where such an assumption would break things. regards, tom lane
Dne 11.03.10 16:24, Greg Stark napsal(a): > On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> My conclusion is that this is probably a compiler bug. Both buildfarm >> animals appear to be using Sun Studio, although on different >> architectures which weakens the compiler-bug theory a bit. Even though >> we are not seeing the failure in later PG branches, it would probably be >> worth looking into more closely, because if it's biting 8.2 it could >> start biting newer code as well. But without access to a machine >> showing the problem it is difficult to do much. > > > Could be this: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087 > > It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this > patch missing on both gothic-moth and codlin-moth? It seems as a our case. See compiler versions: Ghost: -bash-3.2$ cc -V cc: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25 Codlin -bash-4.0$ cc -V cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30 I should apply patch on Ghost, but Codlin have to wait, because I don't have a control on compiler version. I try to find update SS12 somewhere on the disk/network. The patch which you refer does not fix cc itself but some others binaries/libs which cc uses. I try to update Ghost and we will see what happen. > I suppose it's possible to have a configure test to check for whether > this patch is present but I'm not sure how much it's worthwhile given > that it'll only help people who happen to recompile their 8.2 server > after the next Postgres patch. And I'm not sure we can check for > patches without assuming the CC is the OS-shipped cc. Does cc itself > have an option to list which patches it has applied to it? > cc is not shipped with solaris you have to install it separately. And bug appear only when you use high optimization (see my email). You can see patch version when you run cc -V but you see only compiler version. Zdenek
Dne 11.03.10 17:37, Tom Lane napsal(a): > Zdenek Kotala<Zdenek.Kotala@Sun.COM> writes: >> "-xO4 -xalias_level=basic" generates problem. >> "-xO3 -xalias_level=basic" works fine >> "-xO5" works fine > >> As documentation say: > >> Cite from Sun studio compiler guide: >> http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view > >> xalias_level=basic >> ------------------ >> If you use the -xalias_level=basic option, the compiler assumes that >> memory references that involve different C basic types do not alias each >> other. The compiler also assumes that references to all other types can >> alias each other as well as any C basic type. The compiler assumes that >> references using char * can alias any other type. > >> For example, at the -xalias_level=basic level, the compiler assumes that >> a pointer variable of type int * is not going to access a float object. >> Therefore it is safe for the compiler to perform optimizations that >> assume a pointer of type float * will not alias the same memory that is >> referenced with a pointer of type int *. > > I think you need to turn that off. On gcc we use -fno-strict-aliasing > which disables the type of compiler assumption that this is talking about. > I'm not sure exactly how that might create the specific failure we are > seeing here, but I can point you to lots and lots of places in the > sources where such an assumption would break things. OK. I first try to update compiler to latest version to see if it helps and finally I will remove aliasing. Thanks Zdenek
Tom Lane píše v čt 11. 03. 2010 v 11:37 -0500: > Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > > "-xO4 -xalias_level=basic" generates problem. > > "-xO3 -xalias_level=basic" works fine > > "-xO5" works fine > > > As documentation say: > > > Cite from Sun studio compiler guide: > > http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view > > > xalias_level=basic > > ------------------ > > If you use the -xalias_level=basic option, the compiler assumes that > > memory references that involve different C basic types do not alias each > > other. The compiler also assumes that references to all other types can > > alias each other as well as any C basic type. The compiler assumes that > > references using char * can alias any other type. > > > For example, at the -xalias_level=basic level, the compiler assumes that > > a pointer variable of type int * is not going to access a float object. > > Therefore it is safe for the compiler to perform optimizations that > > assume a pointer of type float * will not alias the same memory that is > > referenced with a pointer of type int *. > > I think you need to turn that off. On gcc we use -fno-strict-aliasing > which disables the type of compiler assumption that this is talking about. > I'm not sure exactly how that might create the specific failure we are > seeing here, but I can point you to lots and lots of places in the > sources where such an assumption would break things. Reconfigured and both animal are green. Zdenek