Thread: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Sun, 4 Sep 2011, Tom Lane wrote:

> Jeremy Drake <jeremyd@jdrake.com> writes:
> > I didn't see any changes that looked like they affected
> > CurrentMemoryContext, but I attached the compressed context diff in case
> > you want to look at it.
>
> Right now I have a feeling that this is a compiler bug.

That's my feeling, also.

> Don't know
> whether you have the interest/energy to try to reduce it to a reportable
> test case.

If you mean reporting it to the compiler vendor (Intel), I doubt that
would be worthwhile.  The version of the compiler on this machine is very
out of date.  It is version 9.0 20060222.  I would bet that if I did track
down and report an issue in a 5-year-old compiler version, their first
question would be, does the issue duplicate in the current version.  Given
that my other buildfarm member is running 11.1 20100414 (albeit for the
x64 platform instead of the x86) and had no issue, I would expect that the
current x86 version would also have no problem.

I have intentionally been keeping the compiler versions on my buildfarm
members pretty much fixed, for the benefit of reproducable results.
However, I would be interested in hearing any guidelines on "how old is
too old" for buildfarm member versions.



Jeremy Drake <pgsql@jdrake.com> writes:
> On Sun, 4 Sep 2011, Tom Lane wrote:
>> Right now I have a feeling that this is a compiler bug.

> That's my feeling, also.

>> Don't know
>> whether you have the interest/energy to try to reduce it to a reportable
>> test case.

> If you mean reporting it to the compiler vendor (Intel), I doubt that
> would be worthwhile.  The version of the compiler on this machine is very
> out of date.  It is version 9.0 20060222.  I would bet that if I did track
> down and report an issue in a 5-year-old compiler version, their first
> question would be, does the issue duplicate in the current version.  Given
> that my other buildfarm member is running 11.1 20100414 (albeit for the
> x64 platform instead of the x86) and had no issue, I would expect that the
> current x86 version would also have no problem.

> I have intentionally been keeping the compiler versions on my buildfarm
> members pretty much fixed, for the benefit of reproducable results.
> However, I would be interested in hearing any guidelines on "how old is
> too old" for buildfarm member versions.

Well, I'm still running this on one development machine:

$ gcc -v
Reading specs from /usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/specs
gcc version 2.95.3 20010315 (release)

so I'm not in the "it's too old" camp.  My perspective on tool bugs
is whether there is something (a) well defined and (b) not costly
that we can do to work around it.  So far, this issue is failing
test (a) ... we know that one header inclusion order triggers the
crash and another not, but there is no theory as to why.

What I would suggest is to see whether a more recent x86 version shows
the problem or not.  If not, let's just write it off as an already-fixed
compiler bug.
        regards, tom lane


On Sun, 4 Sep 2011, Tom Lane wrote:

> What I would suggest is to see whether a more recent x86 version shows
> the problem or not.  If not, let's just write it off as an already-fixed
> compiler bug.

I have installed the most recent version in the home directory of a
purpose-made user on that machine.

configure:3252: icc --version >&5
icc (ICC) 12.0.5 20110719
Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.

I did
git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
to get a revision I knew was right in the broken range for mongoose.

Apparently they deprecated one of my compiler flags: -xN (N is for
Nocona), seems they renamed it to -xSSE2.  Since this is a one-off run, I
ignored that warning.

The result is no crash in the cube test.

I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
and if the issue duplicates there, I can see about setting up SSH access
if anyone is still interested in investigating this further.



Jeremy Drake wrote:
> On Sun, 4 Sep 2011, Tom Lane wrote:
> 
> > What I would suggest is to see whether a more recent x86 version shows
> > the problem or not.  If not, let's just write it off as an already-fixed
> > compiler bug.
> 
> I have installed the most recent version in the home directory of a
> purpose-made user on that machine.
> 
> configure:3252: icc --version >&5
> icc (ICC) 12.0.5 20110719
> Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.
> 
> I did
> git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
> to get a revision I knew was right in the broken range for mongoose.
> 
> Apparently they deprecated one of my compiler flags: -xN (N is for
> Nocona), seems they renamed it to -xSSE2.  Since this is a one-off run, I
> ignored that warning.
> 
> The result is no crash in the cube test.
> 
> I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
> and if the issue duplicates there, I can see about setting up SSH access
> if anyone is still interested in investigating this further.

What would we investigate except a compiler bug?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


On Mon, 5 Sep 2011, Bruce Momjian wrote:

> Jeremy Drake wrote:
> > I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
> > and if the issue duplicates there, I can see about setting up SSH access
> > if anyone is still interested in investigating this further.
>
> What would we investigate except a compiler bug?

To me, simply chalking it up to some uncharacterized compiler bug is still
quite a bit of black magic.

But, if that explanation is good enough for you, I've certainly got
better things to do with my holiday than spending time on this :)



Jeremy Drake wrote:
> On Mon, 5 Sep 2011, Bruce Momjian wrote:
> 
> > Jeremy Drake wrote:
> > > I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
> > > and if the issue duplicates there, I can see about setting up SSH access
> > > if anyone is still interested in investigating this further.
> >
> > What would we investigate except a compiler bug?
> 
> To me, simply chalking it up to some uncharacterized compiler bug is still
> quite a bit of black magic.
> 
> But, if that explanation is good enough for you, I've certainly got
> better things to do with my holiday than spending time on this :)

If the underlying tools are buggy, the system can't be reliable.   We
can't invest time to track down every compiler bug, especially when
there are later compiler versions available.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Jeremy Drake <pgsql@jdrake.com> writes:
> On Mon, 5 Sep 2011, Bruce Momjian wrote:
>> What would we investigate except a compiler bug?

> To me, simply chalking it up to some uncharacterized compiler bug is still
> quite a bit of black magic.

If there were some reason to believe either that it wasn't a compiler
bug, or that there would be something reasonable we could do to work
around it, then I'd be interested in pressing further.  But on the
strength of what we have now, neither of those things seem real likely.
I'm with Bruce on thinking that it's probably not going to repay further
effort.
        regards, tom lane