Thread: buildfarm animal shoveler failing with "Illegal instruction"

buildfarm animal shoveler failing with "Illegal instruction"

From
Andres Freund
Date:
Hi Mark,

shoveler has been failing for a while with an odd error. E.g.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shoveler&dt=2020-09-18%2014%3A01%3A48

Illegal instruction
pg_dumpall: error: pg_dump failed on database "template1", exiting
waiting for server to shut down.... done

None of the changes in that time frame look like they're likely causing
illegal instructions to be emitted that weren't before. So I am
wondering if anything changed on that machine around  2020-09-18
14:01:48 ?

Greetings,

Andres Freund



Re: buildfarm animal shoveler failing with "Illegal instruction"

From
Mark Wong
Date:
On Thu, Oct 01, 2020 at 12:12:44PM -0700, Andres Freund wrote:
> Hi Mark,
> 
> shoveler has been failing for a while with an odd error. E.g.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shoveler&dt=2020-09-18%2014%3A01%3A48
> 
> Illegal instruction
> pg_dumpall: error: pg_dump failed on database "template1", exiting
> waiting for server to shut down.... done
> 
> None of the changes in that time frame look like they're likely causing
> illegal instructions to be emitted that weren't before. So I am
> wondering if anything changed on that machine around  2020-09-18
> 14:01:48 ?

It looks like the last package update was 2020-06-10 06:59:26, according
to the apt logs.

I'm getting Tom set up with access too, in case he has time before me to
get a stack trace to see what's happening...

Regards,
Mark



Re: buildfarm animal shoveler failing with "Illegal instruction"

From
Tom Lane
Date:
Mark Wong <mark@2ndquadrant.com> writes:
> I'm getting Tom set up with access too, in case he has time before me to
> get a stack trace to see what's happening...

tl;dr: it's hard to conclude that this is anything but a compiler bug.

I was able to reproduce this on shoveler's host, but only when using
the compiler shoveler uses (clang-3.9), not the 6.3 gcc that's also
on there and is of similar vintage.  Observations:

* You don't need any complicated test case; "pg_dump template1"
is enough.

* Reverting 1ed6b8956's addition of a "postfix operators are not supported
anymore" warning to dumpOpr() makes it go away.  This despite the fact
that that code is never reached when dumping template1.  (We do enter
dumpOpr, but the oprinfo->dobj.dump test always fails.)

* Reducing the optimization level to -O1 or -O0 makes it go away.

* Inserting a debugging fprintf in dumpOpr makes it go away.

Since clang 3.9 is several years old, maybe we could move shoveler
up to a newer version?  Or dial it down to -O1 optimization?

            regards, tom lane



Re: buildfarm animal shoveler failing with "Illegal instruction"

From
Mark Wong
Date:
On Thu, Oct 01, 2020 at 09:12:53PM -0400, Tom Lane wrote:
> Mark Wong <mark@2ndquadrant.com> writes:
> > I'm getting Tom set up with access too, in case he has time before me to
> > get a stack trace to see what's happening...
> 
> tl;dr: it's hard to conclude that this is anything but a compiler bug.
> 
> I was able to reproduce this on shoveler's host, but only when using
> the compiler shoveler uses (clang-3.9), not the 6.3 gcc that's also
> on there and is of similar vintage.  Observations:
> 
> * You don't need any complicated test case; "pg_dump template1"
> is enough.
> 
> * Reverting 1ed6b8956's addition of a "postfix operators are not supported
> anymore" warning to dumpOpr() makes it go away.  This despite the fact
> that that code is never reached when dumping template1.  (We do enter
> dumpOpr, but the oprinfo->dobj.dump test always fails.)
> 
> * Reducing the optimization level to -O1 or -O0 makes it go away.
> 
> * Inserting a debugging fprintf in dumpOpr makes it go away.
> 
> Since clang 3.9 is several years old, maybe we could move shoveler
> up to a newer version?  Or dial it down to -O1 optimization?

There is ayu, same system with clang 4.0, so covered on that front.

I went ahead and stopped the jobs to run with clang 3.9.  This is also
the same system that was running clang 3.8 too.  I tried looking for EOL
dates, but had trouble finding anything...  But I can change the
optimization flag if we want it back.

Regards,
Mark
-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: buildfarm animal shoveler failing with "Illegal instruction"

From
Andres Freund
Date:
On 2020-10-02 10:45:58 -0700, Mark Wong wrote:
> I went ahead and stopped the jobs to run with clang 3.9.  This is also
> the same system that was running clang 3.8 too.  I tried looking for EOL
> dates, but had trouble finding anything...  But I can change the
> optimization flag if we want it back.

llvm officially only supports the last minor version, and only does one
or two point releases for them. 3.9 and 3.8 are long past EOL.



Re: buildfarm animal shoveler failing with "Illegal instruction"

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2020-10-02 10:45:58 -0700, Mark Wong wrote:
>> I went ahead and stopped the jobs to run with clang 3.9.  This is also
>> the same system that was running clang 3.8 too.  I tried looking for EOL
>> dates, but had trouble finding anything...  But I can change the
>> optimization flag if we want it back.

> llvm officially only supports the last minor version, and only does one
> or two point releases for them. 3.9 and 3.8 are long past EOL.

I thought about asking Mark to re-enable it at -O1, but we have recent
experience reminding us that non-default optimization levels are likely
to be even buggier than the default [1].  So that's probably not a
productive answer.  We might as well just retire the animal.

            regards, tom lane

[1] https://www.postgresql.org/message-id/1934344.1596305790@sss.pgh.pa.us