Thread: gothic_moth, codlin_moth failures on REL8_2_STABLE

gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Tom Lane
Date:
Since the buildfarm is mostly green these days, I took some time to look
into the few remaining consistent failures.  One is that gothic_moth and
codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
regression diff like this:

*** 2453,2459 ****  <body>  <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>  <a
href="http://www.google.com/foo.bar.html"target="_blank">YES  </a>
 
!   ff-bg  <script>         document.write(15);  </script>
--- 2453,2459 ----  <body>  <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>  <a
href="http://www.google.com/foo.bar.html"target="_blank">YES  </a>
 
!  ff-bgff-bg  <script>         document.write(15);  </script>

These animals are not testing any branches older than 8.2.  The same
test appears in newer branches and passes, but the code involved got
migrated to core and probably changed around a bit.

I traced through this test on my own machine and determined that the
way it's supposed to work is like this: the tsearch parser breaks the
string into a series of tokens that include these:
ff-bg        compound wordff        compound word element-        punctuationbg        compound word element

The highlight function is then supposed to set skip = 1 on the compound
word, causing it to be skipped when genhl() reconstructs the text.
The failure looks to me like the compound word is not getting skipped.
Both the setting and the testing of the flag seem to be absolutely
straightforward portable code; although the "skip" struct field is a
bitfield, which is a C feature we don't use very heavily.

My conclusion is that this is probably a compiler bug.  Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit.  Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well.  But without access to a machine
showing the problem it is difficult to do much.
        regards, tom lane


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Greg Stark
Date:
On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My conclusion is that this is probably a compiler bug.  Both buildfarm
> animals appear to be using Sun Studio, although on different
> architectures which weakens the compiler-bug theory a bit.  Even though
> we are not seeing the failure in later PG branches, it would probably be
> worth looking into more closely, because if it's biting 8.2 it could
> start biting newer code as well.  But without access to a machine
> showing the problem it is difficult to do much.


Could be this:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087

It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
patch missing on both gothic-moth and codlin-moth?

I suppose it's possible to have a configure test to check for whether
this patch is present but I'm not sure how much it's worthwhile given
that it'll only help people who happen to recompile their 8.2 server
after the next Postgres patch. And I'm not sure we can check for
patches without assuming the CC is the OS-shipped cc. Does cc itself
have an option to list which patches it has applied to it?

--
greg


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Greg Stark
Date:
Incidentally Zdenek came to the same conclusion that it was a compiler
bug in <4AA775A9.80702@sun.com>

-- 
greg


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Incidentally Zdenek came to the same conclusion that it was a compiler
> bug in <4AA775A9.80702@sun.com>

Drat, I had forgotten that exchange.  I reconstructed Teodor's advice
the hard way :-(
        regards, tom lane


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My conclusion is that this is probably a compiler bug.

> Could be this:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087

Hmmm ... that doesn't seem to be quite an exact match, because the
setting and testing of the bitfield is in different functions in
different files in our case.  Still, it seems related.  It would
be useful to verify whether these two buildfarm animals are fully
up-to-date on compiler patches.
        regards, tom lane


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Zdenek Kotala
Date:
Hi Tom,

I'm sorry that I did not look on it early. I played with it and there 
are some facts. gothic(sparc) and codlin(x86) uses Sun Studio 12 nad I 
setup them to use very high optimization.

Gothic:
-------
-xalias_level=basic -xarch=native -xdepend -xmemalign=8s -xO5 
-xprefetch=auto,explicit

Codlin:
-------
-xalias_level=basic -xarch=native -xdepend -xO4 -xprefetch=auto,explicit

-xO5 is highest optimization, -xO4 is little bit worse

A play with flags and found that

"-xO4 -xalias_level=basic" generates problem.

"-xO3 -xalias_level=basic" works fine

"-xO5" works fine


As documentation say:

Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

------------------------------------------------------------------------
xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that 
memory references that involve different C basic types do not alias each 
other. The compiler also assumes that references to all other types can 
alias each other as well as any C basic type. The compiler assumes that 
references using char * can alias any other type.

For example, at the -xalias_level=basic level, the compiler assumes that 
a pointer variable of type int * is not going to access a float object. 
Therefore it is safe for the compiler to perform optimizations that 
assume a pointer of type float * will not alias the same memory that is 
referenced with a pointer of type int *.

-x04
-----
Preforms automatic inlining of functions contained in the same file in 
addition to performing -xO3 optimizations. This automatic inlining 
usually improves execution speed, but sometimes makes it worse. In 
general, this level results in increased code size.

------------------------------------------------------------------------


I redefined  bitfields to char in  HLWORD and it works. Your guess is 
correct. But question still where is the place when bitfields works bad. 
Any idea where I should look?

IIRC, I had this problem also on head, when I tried to fix tsearch 
regression test for Czech locale. This problem appears and disappears.
Zdenek




Dne 11.03.10 00:37, Tom Lane napsal(a):
> Since the buildfarm is mostly green these days, I took some time to look
> into the few remaining consistent failures.  One is that gothic_moth and
> codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
> regression diff like this:
>
> *** 2453,2459 ****
>     <body>
>     <b>Sea</b>  view wow<u><b>foo</b>  bar</u>  <i>qq</i>
>     <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a>
> !   ff-bg
>     <script>
>            document.write(15);
>     </script>
> --- 2453,2459 ----
>     <body>
>     <b>Sea</b>  view wow<u><b>foo</b>  bar</u>  <i>qq</i>
>     <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a>
> !  ff-bgff-bg
>     <script>
>            document.write(15);
>     </script>
>
> These animals are not testing any branches older than 8.2.  The same
> test appears in newer branches and passes, but the code involved got
> migrated to core and probably changed around a bit.
>
> I traced through this test on my own machine and determined that the
> way it's supposed to work is like this: the tsearch parser breaks the
> string into a series of tokens that include these:
>
>     ff-bg        compound word
>     ff        compound word element
>     -        punctuation
>     bg        compound word element
>
> The highlight function is then supposed to set skip = 1 on the compound
> word, causing it to be skipped when genhl() reconstructs the text.
> The failure looks to me like the compound word is not getting skipped.
> Both the setting and the testing of the flag seem to be absolutely
> straightforward portable code; although the "skip" struct field is a
> bitfield, which is a C feature we don't use very heavily.
>
> My conclusion is that this is probably a compiler bug.  Both buildfarm
> animals appear to be using Sun Studio, although on different
> architectures which weakens the compiler-bug theory a bit.  Even though
> we are not seeing the failure in later PG branches, it would probably be
> worth looking into more closely, because if it's biting 8.2 it could
> start biting newer code as well.  But without access to a machine
> showing the problem it is difficult to do much.
>
>             regards, tom lane



Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> "-xO4 -xalias_level=basic" generates problem.
> "-xO3 -xalias_level=basic" works fine
> "-xO5" works fine

> As documentation say:

> Cite from Sun studio compiler guide:
> http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

> xalias_level=basic
> ------------------
> If you use the -xalias_level=basic option, the compiler assumes that 
> memory references that involve different C basic types do not alias each 
> other. The compiler also assumes that references to all other types can 
> alias each other as well as any C basic type. The compiler assumes that 
> references using char * can alias any other type.

> For example, at the -xalias_level=basic level, the compiler assumes that 
> a pointer variable of type int * is not going to access a float object. 
> Therefore it is safe for the compiler to perform optimizations that 
> assume a pointer of type float * will not alias the same memory that is 
> referenced with a pointer of type int *.

I think you need to turn that off.  On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.
        regards, tom lane


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Zdenek Kotala
Date:
Dne 11.03.10 16:24, Greg Stark napsal(a):
> On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> My conclusion is that this is probably a compiler bug.  Both buildfarm
>> animals appear to be using Sun Studio, although on different
>> architectures which weakens the compiler-bug theory a bit.  Even though
>> we are not seeing the failure in later PG branches, it would probably be
>> worth looking into more closely, because if it's biting 8.2 it could
>> start biting newer code as well.  But without access to a machine
>> showing the problem it is difficult to do much.
>
>
> Could be this:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087
>
> It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
> patch missing on both gothic-moth and codlin-moth?

It seems as a our case. See compiler versions:

Ghost:
-bash-3.2$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25

Codlin
-bash-4.0$ cc -V
cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30


I should apply patch on Ghost, but Codlin have to wait, because I don't 
have a control on compiler version. I try to find update SS12 somewhere 
on the disk/network.

The patch which you refer does not fix cc itself but some others 
binaries/libs which cc uses.

I try to update Ghost and we will see what happen.

> I suppose it's possible to have a configure test to check for whether
> this patch is present but I'm not sure how much it's worthwhile given
> that it'll only help people who happen to recompile their 8.2 server
> after the next Postgres patch. And I'm not sure we can check for
> patches without assuming the CC is the OS-shipped cc. Does cc itself
> have an option to list which patches it has applied to it?
>

cc is not shipped with solaris you have to install it separately. And 
bug appear only when you use high optimization (see my email). You can 
see patch version when you run cc -V but you see only compiler version.
Zdenek



Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Zdenek Kotala
Date:
Dne 11.03.10 17:37, Tom Lane napsal(a):
> Zdenek Kotala<Zdenek.Kotala@Sun.COM>  writes:
>> "-xO4 -xalias_level=basic" generates problem.
>> "-xO3 -xalias_level=basic" works fine
>> "-xO5" works fine
>
>> As documentation say:
>
>> Cite from Sun studio compiler guide:
>> http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view
>
>> xalias_level=basic
>> ------------------
>> If you use the -xalias_level=basic option, the compiler assumes that
>> memory references that involve different C basic types do not alias each
>> other. The compiler also assumes that references to all other types can
>> alias each other as well as any C basic type. The compiler assumes that
>> references using char * can alias any other type.
>
>> For example, at the -xalias_level=basic level, the compiler assumes that
>> a pointer variable of type int * is not going to access a float object.
>> Therefore it is safe for the compiler to perform optimizations that
>> assume a pointer of type float * will not alias the same memory that is
>> referenced with a pointer of type int *.
>
> I think you need to turn that off.  On gcc we use -fno-strict-aliasing
> which disables the type of compiler assumption that this is talking about.
> I'm not sure exactly how that might create the specific failure we are
> seeing here, but I can point you to lots and lots of places in the
> sources where such an assumption would break things.

OK. I first try to update compiler to latest version to see if it helps 
and finally I will remove aliasing.
Thanks Zdenek


Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

From
Zdenek Kotala
Date:
Tom Lane píše v čt 11. 03. 2010 v 11:37 -0500:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> > "-xO4 -xalias_level=basic" generates problem.
> > "-xO3 -xalias_level=basic" works fine
> > "-xO5" works fine
> 
> > As documentation say:
> 
> > Cite from Sun studio compiler guide:
> > http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view
> 
> > xalias_level=basic
> > ------------------
> > If you use the -xalias_level=basic option, the compiler assumes that 
> > memory references that involve different C basic types do not alias each 
> > other. The compiler also assumes that references to all other types can 
> > alias each other as well as any C basic type. The compiler assumes that 
> > references using char * can alias any other type.
> 
> > For example, at the -xalias_level=basic level, the compiler assumes that 
> > a pointer variable of type int * is not going to access a float object. 
> > Therefore it is safe for the compiler to perform optimizations that 
> > assume a pointer of type float * will not alias the same memory that is 
> > referenced with a pointer of type int *.
> 
> I think you need to turn that off.  On gcc we use -fno-strict-aliasing
> which disables the type of compiler assumption that this is talking about.
> I'm not sure exactly how that might create the specific failure we are
> seeing here, but I can point you to lots and lots of places in the
> sources where such an assumption would break things.

Reconfigured and both animal are green.
Zdenek