Thread: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Michael Paquier
Date:
Hi all,

serinus has been complaining about the new gcd functions in 13~:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14

The overflow detection is going wrong the way up and down, like here:
 SELECT gcd((-9223372036854775808)::int8, (-9223372036854775808)::int8); -- overflow
-ERROR:  bigint out of range
+         gcd
+----------------------
+ -9223372036854775808
+(1 row)

That seems like a compiler bug to me as this host uses recent GCC
snapshots, and I cannot see a problem in GCC 10.2 on my own dev box.
But perhaps I am missing something?

Thanks,
--
Michael

Attachment

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Dean Rasheed
Date:
On Thu, 3 Jun 2021 at 08:26, Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
>
> serinus has been complaining about the new gcd functions in 13~:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14
>
> The overflow detection is going wrong the way up and down, like here:
>  SELECT gcd((-9223372036854775808)::int8, (-9223372036854775808)::int8); -- overflow
> -ERROR:  bigint out of range
> +         gcd
> +----------------------
> + -9223372036854775808
> +(1 row)
>
> That seems like a compiler bug to me as this host uses recent GCC
> snapshots, and I cannot see a problem in GCC 10.2 on my own dev box.
> But perhaps I am missing something?
>

Huh, yeah. The code is pretty clear that that should throw an error:

    if (arg1 == PG_INT64_MIN)
    {
        if (arg2 == 0 || arg2 == PG_INT64_MIN)
            ereport(ERROR,
                    (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
                     errmsg("bigint out of range")));

and FWIW it works OK on my dev box with gcc 10.2.1 and the same cflags.

Regards,
Dean



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Sergei Kornilov
Date:
Hello

I build gcc version 12.0.0 20210603 (experimental) locally, then tried REL_13_STABLE with same CFLAGS as serinus
./configure --prefix=/home/melkij/tmp/pgdev/inst CFLAGS="-O3 -ggdb -g3 -Wall -Wextra -Wno-unused-parameter
-Wno-sign-compare-Wno-missing-field-initializers" --enable-tap-tests --enable-cassert --enable-debug
 
check-world passed.

regards, Sergei



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> serinus has been complaining about the new gcd functions in 13~:

moonjelly, which also runs a bleeding-edge gcc, started to fail the same
way at about the same time.  Given that none of our code in that area
has changed, it's hard to think it's anything but a broken compiler.
Maybe somebody should report that to gcc upstream?

            regards, tom lane



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
>> serinus has been complaining about the new gcd functions in 13~:
>
> moonjelly, which also runs a bleeding-edge gcc, started to fail the same
> way at about the same time.  Given that none of our code in that area
> has changed, it's hard to think it's anything but a broken compiler.

> Maybe somebody should report that to gcc upstream?

Yes.

I will isolate a small case (hopefully) and do a report over week-end, 
after checking that the latest version is still broken.

-- 
Fabien.



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
>>> serinus has been complaining about the new gcd functions in 13~:
>> 
>> moonjelly, which also runs a bleeding-edge gcc, started to fail the same
>> way at about the same time.  Given that none of our code in that area
>> has changed, it's hard to think it's anything but a broken compiler.
>
>> Maybe somebody should report that to gcc upstream?
>
> Yes.
>
> I will isolate a small case (hopefully) and do a report over week-end, after 
> checking that the latest version is still broken.

Not needed in the end, the problem has disappeared with today's 
gcc recompilation.

-- 
Fabien.



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Alvaro Herrera
Date:
On 2021-Jun-03, Michael Paquier wrote:

> Hi all,
> 
> serinus has been complaining about the new gcd functions in 13~:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14

Hello, this problem is still happening; serinus' configure output says
it's running a snapshot from 20210527, and Fabien mentioned downthread
that his compiler stopped complaining on 2021-06-05.  Andres, maybe the
compiler in serinus is due for an update too?

Thanks

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on."                             (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Hello, this problem is still happening; serinus' configure output says
> it's running a snapshot from 20210527, and Fabien mentioned downthread
> that his compiler stopped complaining on 2021-06-05.  Andres, maybe the
> compiler in serinus is due for an update too?

Yeah, serinus is visibly still running one of the broken revisions:

configure: using compiler=gcc (Debian 20210527-1) 12.0.0 20210527 (experimental) [master revision
262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973]

It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
too.  That seems to be a different issue:

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f3827947859 in __GI_abort () at abort.c:79
#2  0x00007f3827947729 in __assert_fail_base (fmt=0x7f3827add588 "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
assertion=0x7f381c3ce4c8"S->getValue() && \\"Releasing SymbolStringPtr with zero ref count\\"", file=0x7f381c3ce478
"/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",line=91, function=<optimized out>) at
assert.c:92
#3  0x00007f3827958f36 in __GI___assert_fail (assertion=0x7f381c3ce4c8 "S->getValue() && \\"Releasing SymbolStringPtr
withzero ref count\\"", file=0x7f381c3ce478
"/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",line=91, function=0x7f381c3ce570
"llvm::orc::SymbolStringPtr::~SymbolStringPtr()")at assert.c:101 
#4  0x00007f381c23c98d in llvm::orc::SymbolStringPtr::~SymbolStringPtr (this=0x277a8b0, __in_chrg=<optimized out>) at
/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:91
#5  0x00007f381c24f879 in std::_Destroy<llvm::orc::SymbolStringPtr> (__pointer=0x277a8b0) at
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:140
#6  0x00007f381c24d10c in std::_Destroy_aux<false>::__destroy<llvm::orc::SymbolStringPtr*> (__first=0x277a8b0,
__last=0x277a998)at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:152 
#7  0x00007f381c2488a6 in std::_Destroy<llvm::orc::SymbolStringPtr*> (__first=0x277a8b0, __last=0x277a998) at
/home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:185
#8  0x00007f381c243c51 in std::_Destroy<llvm::orc::SymbolStringPtr*, llvm::orc::SymbolStringPtr> (__first=0x277a8b0,
__last=0x277a998)at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/alloc_traits.h:738 
#9  0x00007f381c23f1c3 in std::vector<llvm::orc::SymbolStringPtr, std::allocator<llvm::orc::SymbolStringPtr> >::~vector
(this=0x7ffc73440a10,__in_chrg=<optimized out>) at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_vector.h:680 
#10 0x00007f381c26112c in llvm::orc::JITDylib::removeTracker (this=0x18b4240, RT=...) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1464
#11 0x00007f381c264cb9 in operator() (__closure=0x7ffc73440d00) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2054
#12 0x00007f381c264d29 in
llvm::orc::ExecutionSession::runSessionLocked<llvm::orc::ExecutionSession::removeResourceTracker(llvm::orc::ResourceTracker&)::<lambda()>
>(struct{...} &&) (this=0x187d110, F=...) at /home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/Core.h:1291 
#13 0x00007f381c264ebc in llvm::orc::ExecutionSession::removeResourceTracker (this=0x187d110, RT=...) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2051
#14 0x00007f381c25734c in llvm::orc::ResourceTracker::remove (this=0x1910c30) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:53
#15 0x00007f381c25a9c1 in llvm::orc::JITDylib::clear (this=0x18b4240) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:622
#16 0x00007f381c26305e in llvm::orc::ExecutionSession::endSession (this=0x187d110) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1777
#17 0x00007f381c3373a3 in llvm::orc::LLJIT::~LLJIT (this=0x18a73b0, __in_chrg=<optimized out>) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp:1002
#18 0x00007f381c38af48 in LLVMOrcDisposeLLJIT (J=0x18a73b0) at
/home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/OrcV2CBindings.cpp:561
#19 0x00007f381c596612 in llvm_shutdown (code=<optimized out>, arg=140722242323824) at llvmjit.c:892
#20 0x00000000007d4385 in proc_exit_prepare (code=code@entry=0) at ipc.c:209
#21 0x00000000007d4288 in proc_exit (code=code@entry=0) at ipc.c:107


            regards, tom lane



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
> It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
> too.

There was a silent API breakage (same API, different behavior, how nice…) 
in llvm main that Andres figured out, which will have to be fixed at some 
point, so this is reminder that it is still a todo… Not sure when a fix is 
planned, though. I'm afraid portability may require that different code is 
executed depending on llvm version. Or maybe we should wrestle a revert on 
llvm side? Hmmm…

So I'm not very confident that the noise will go away quickly, sorry.

-- 
Fabien.

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
>> too.

> There was a silent API breakage (same API, different behavior, how nice…) 
> in llvm main that Andres figured out, which will have to be fixed at some 
> point, so this is reminder that it is still a todo…

If it were *our* todo, that would be one thing; but it isn't.

> Not sure when a fix is 
> planned, though. I'm afraid portability may require that different code is 
> executed depending on llvm version. Or maybe we should wrestle a revert on 
> llvm side? Hmmm…

> So I'm not very confident that the noise will go away quickly, sorry.

Could you please just shut down the animal until that's dealt with?
It's extremely unpleasant to have to root through a lot of useless
failures to find the ones that might be of interest.  Right now
serinus and seawasp are degrading this report nearly to uselessness:

https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

            regards, tom lane



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
Hello Tom,

>> So I'm not very confident that the noise will go away quickly, sorry.
>
> Could you please just shut down the animal until that's dealt with?

Hmmm… Obviously I can.

However, please note that the underlying logic of "a test is failing, 
let's just remove it" does not sound right to me at all:-(

The test is failing because there is a problem, and shuting down the test 
to improve a report does not in any way help to fix it, it just helps to 
hide it.

> It's extremely unpleasant to have to root through a lot of useless
> failures

I do not understand how they are useless. Pg does not work properly with 
current LLVM, and keeps on not working. I think that this information is 
worthy, even if I do not like it and would certainly prefer a quick fix.

> to find the ones that might be of interest.  Right now serinus and 
> seawasp are degrading this report nearly to uselessness:
>
> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

IMHO, the report should be improved, not the test removed.

If you insist I will shut down the animal, bit I'd prefer not to.

I think that the reminder has value, and just because some report is not 
designed to handle this nicely does not seem like a good reason to do 
that.

-- 
Fabien.

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Could you please just shut down the animal until that's dealt with?

> The test is failing because there is a problem, and shuting down the test 
> to improve a report does not in any way help to fix it, it just helps to 
> hide it.

Our buildfarm is run for the use of the Postgres project, not the LLVM
project.  I'm not really happy that it contains any experimental-compiler
animals at all, but as long as they're unobtrusive I can stand it.
serinus and seawasp are being the opposite of unobtrusive.

If you don't want to shut it down entirely, maybe backing it off to
run only once a week would be an acceptable compromise.  Since you
only update its compiler version once a week, I doubt we learn much
from runs done more often than that anyway.

            regards, tom lane



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
Hello Tom,

>>> Could you please just shut down the animal until that's dealt with?
>
>> The test is failing because there is a problem, and shuting down the test
>> to improve a report does not in any way help to fix it, it just helps to
>> hide it.
>
> Our buildfarm is run for the use of the Postgres project, not the LLVM
> project.

The point of these animals is to have early warning of upcoming compiler 
changes. Given the release cycle of the project and the fact that a 
version is expected to work for 5 years, this is a clear benefit for 
postgres, IMO. When the compiler is broken, it is noisy, too bad.

In this instance the compiler is not broken, but postgres is.

If the consensus is that these animals are useless, I'll remove them, and 
be sad that the community is not able to see their value.

> I'm not really happy that it contains any experimental-compiler
> animals at all, but as long as they're unobtrusive I can stand it.
> serinus and seawasp are being the opposite of unobtrusive.

I think that the problem is the report, not the failing animal.

In French we say "ce n’est pas en cassant le thermomètre qu’on fait tomber 
la fièvre", which is an equivalent of "don't shoot the messenger".

> If you don't want to shut it down entirely, maybe backing it off to
> run only once a week would be an acceptable compromise.  Since you
> only update its compiler version once a week, I doubt we learn much
> from runs done more often than that anyway.

Hmmm… I can slow it down. We will wait one week to learn that the problems 
have been fixed, wow.

<Sigh>.

-- 
Fabien.

Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Thomas Munro
Date:
On Sat, Jun 19, 2021 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
> >> It'd sure be nice if seawasp stopped spamming the buildfarm failure log,
> >> too.
>
> > There was a silent API breakage (same API, different behavior, how nice…)
> > in llvm main that Andres figured out, which will have to be fixed at some
> > point, so this is reminder that it is still a todo…
>
> If it were *our* todo, that would be one thing; but it isn't.

Over on the other thread[1] we learned that this is an API change
affecting reference counting semantics[2], so unless there is some
discussion somewhere about reverting the LLVM change that I'm unaware
of, I'm guessing we're going to need to change our code sooner or
later.  I have a bleeding edge LLVM on my dev machine, and I'm willing
to try to reproduce the crash and write the trivial patch (that is,
figure out the right preprocessor incantation to detect the version or
feature, and bump the reference count as appropriate), if Andres
and/or Fabien aren't already on the case.

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLEy8mgtN7BNp0ooFAjUedDTJj5dME7NxLU-m91b85siA%40mail.gmail.com
[2] https://github.com/llvm/llvm-project/commit/c8fc5e3ba942057d6c4cdcd1faeae69a28e7b671



Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

From
Fabien COELHO
Date:
>>> There was a silent API breakage (same API, different behavior, how nice…)
>>> in llvm main that Andres figured out, which will have to be fixed at some
>>> point, so this is reminder that it is still a todo…
>>
>> If it were *our* todo, that would be one thing; but it isn't.
>
> Over on the other thread[1] we learned that this is an API change
> affecting reference counting semantics[2], so unless there is some
> discussion somewhere about reverting the LLVM change that I'm unaware
> of, I'm guessing we're going to need to change our code sooner or
> later.

Indeed, I'm afraid the solution will have to be on pg side.

> I have a bleeding edge LLVM on my dev machine, and I'm willing to try to 
> reproduce the crash and write the trivial patch (that is, figure out the 
> right preprocessor incantation to detect the version or feature, and 
> bump the reference count as appropriate), if Andres and/or Fabien aren't 
> already on the case.

I'm not in the case, I'm only the one running the farm animal which barks 
too annoyingly for Tom.

-- 
Fabien.