Thread: Failures with gcd functions with GCC snapshots GCC and -O3 (?)
Hi all, serinus has been complaining about the new gcd functions in 13~: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14 The overflow detection is going wrong the way up and down, like here: SELECT gcd((-9223372036854775808)::int8, (-9223372036854775808)::int8); -- overflow -ERROR: bigint out of range + gcd +---------------------- + -9223372036854775808 +(1 row) That seems like a compiler bug to me as this host uses recent GCC snapshots, and I cannot see a problem in GCC 10.2 on my own dev box. But perhaps I am missing something? Thanks, -- Michael
Attachment
On Thu, 3 Jun 2021 at 08:26, Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > > serinus has been complaining about the new gcd functions in 13~: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14 > > The overflow detection is going wrong the way up and down, like here: > SELECT gcd((-9223372036854775808)::int8, (-9223372036854775808)::int8); -- overflow > -ERROR: bigint out of range > + gcd > +---------------------- > + -9223372036854775808 > +(1 row) > > That seems like a compiler bug to me as this host uses recent GCC > snapshots, and I cannot see a problem in GCC 10.2 on my own dev box. > But perhaps I am missing something? > Huh, yeah. The code is pretty clear that that should throw an error: if (arg1 == PG_INT64_MIN) { if (arg2 == 0 || arg2 == PG_INT64_MIN) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("bigint out of range"))); and FWIW it works OK on my dev box with gcc 10.2.1 and the same cflags. Regards, Dean
Hello I build gcc version 12.0.0 20210603 (experimental) locally, then tried REL_13_STABLE with same CFLAGS as serinus ./configure --prefix=/home/melkij/tmp/pgdev/inst CFLAGS="-O3 -ggdb -g3 -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare-Wno-missing-field-initializers" --enable-tap-tests --enable-cassert --enable-debug check-world passed. regards, Sergei
Michael Paquier <michael@paquier.xyz> writes: > serinus has been complaining about the new gcd functions in 13~: moonjelly, which also runs a bleeding-edge gcc, started to fail the same way at about the same time. Given that none of our code in that area has changed, it's hard to think it's anything but a broken compiler. Maybe somebody should report that to gcc upstream? regards, tom lane
>> serinus has been complaining about the new gcd functions in 13~: > > moonjelly, which also runs a bleeding-edge gcc, started to fail the same > way at about the same time. Given that none of our code in that area > has changed, it's hard to think it's anything but a broken compiler. > Maybe somebody should report that to gcc upstream? Yes. I will isolate a small case (hopefully) and do a report over week-end, after checking that the latest version is still broken. -- Fabien.
>>> serinus has been complaining about the new gcd functions in 13~: >> >> moonjelly, which also runs a bleeding-edge gcc, started to fail the same >> way at about the same time. Given that none of our code in that area >> has changed, it's hard to think it's anything but a broken compiler. > >> Maybe somebody should report that to gcc upstream? > > Yes. > > I will isolate a small case (hopefully) and do a report over week-end, after > checking that the latest version is still broken. Not needed in the end, the problem has disappeared with today's gcc recompilation. -- Fabien.
On 2021-Jun-03, Michael Paquier wrote: > Hi all, > > serinus has been complaining about the new gcd functions in 13~: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2021-06-03%2003%3A44%3A14 Hello, this problem is still happening; serinus' configure output says it's running a snapshot from 20210527, and Fabien mentioned downthread that his compiler stopped complaining on 2021-06-05. Andres, maybe the compiler in serinus is due for an update too? Thanks -- Álvaro Herrera 39°49'30"S 73°17'W "Once again, thank you and all of the developers for your hard work on PostgreSQL. This is by far the most pleasant management experience of any database I've worked on." (Dan Harris) http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Hello, this problem is still happening; serinus' configure output says > it's running a snapshot from 20210527, and Fabien mentioned downthread > that his compiler stopped complaining on 2021-06-05. Andres, maybe the > compiler in serinus is due for an update too? Yeah, serinus is visibly still running one of the broken revisions: configure: using compiler=gcc (Debian 20210527-1) 12.0.0 20210527 (experimental) [master revision 262e75d22c3:7bb6b9b2f47:9d3a953ec4d2695e9a6bfa5f22655e2aea47a973] It'd sure be nice if seawasp stopped spamming the buildfarm failure log, too. That seems to be a different issue: Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f3827947859 in __GI_abort () at abort.c:79 #2 0x00007f3827947729 in __assert_fail_base (fmt=0x7f3827add588 "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n", assertion=0x7f381c3ce4c8"S->getValue() && \\"Releasing SymbolStringPtr with zero ref count\\"", file=0x7f381c3ce478 "/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",line=91, function=<optimized out>) at assert.c:92 #3 0x00007f3827958f36 in __GI___assert_fail (assertion=0x7f381c3ce4c8 "S->getValue() && \\"Releasing SymbolStringPtr withzero ref count\\"", file=0x7f381c3ce478 "/home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h",line=91, function=0x7f381c3ce570 "llvm::orc::SymbolStringPtr::~SymbolStringPtr()")at assert.c:101 #4 0x00007f381c23c98d in llvm::orc::SymbolStringPtr::~SymbolStringPtr (this=0x277a8b0, __in_chrg=<optimized out>) at /home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:91 #5 0x00007f381c24f879 in std::_Destroy<llvm::orc::SymbolStringPtr> (__pointer=0x277a8b0) at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:140 #6 0x00007f381c24d10c in std::_Destroy_aux<false>::__destroy<llvm::orc::SymbolStringPtr*> (__first=0x277a8b0, __last=0x277a998)at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:152 #7 0x00007f381c2488a6 in std::_Destroy<llvm::orc::SymbolStringPtr*> (__first=0x277a8b0, __last=0x277a998) at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_construct.h:185 #8 0x00007f381c243c51 in std::_Destroy<llvm::orc::SymbolStringPtr*, llvm::orc::SymbolStringPtr> (__first=0x277a8b0, __last=0x277a998)at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/alloc_traits.h:738 #9 0x00007f381c23f1c3 in std::vector<llvm::orc::SymbolStringPtr, std::allocator<llvm::orc::SymbolStringPtr> >::~vector (this=0x7ffc73440a10,__in_chrg=<optimized out>) at /home/fabien/gcc-10-bin/include/c++/10.3.1/bits/stl_vector.h:680 #10 0x00007f381c26112c in llvm::orc::JITDylib::removeTracker (this=0x18b4240, RT=...) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1464 #11 0x00007f381c264cb9 in operator() (__closure=0x7ffc73440d00) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2054 #12 0x00007f381c264d29 in llvm::orc::ExecutionSession::runSessionLocked<llvm::orc::ExecutionSession::removeResourceTracker(llvm::orc::ResourceTracker&)::<lambda()> >(struct{...} &&) (this=0x187d110, F=...) at /home/fabien/llvm-src/llvm/include/llvm/ExecutionEngine/Orc/Core.h:1291 #13 0x00007f381c264ebc in llvm::orc::ExecutionSession::removeResourceTracker (this=0x187d110, RT=...) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:2051 #14 0x00007f381c25734c in llvm::orc::ResourceTracker::remove (this=0x1910c30) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:53 #15 0x00007f381c25a9c1 in llvm::orc::JITDylib::clear (this=0x18b4240) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:622 #16 0x00007f381c26305e in llvm::orc::ExecutionSession::endSession (this=0x187d110) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/Core.cpp:1777 #17 0x00007f381c3373a3 in llvm::orc::LLJIT::~LLJIT (this=0x18a73b0, __in_chrg=<optimized out>) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp:1002 #18 0x00007f381c38af48 in LLVMOrcDisposeLLJIT (J=0x18a73b0) at /home/fabien/llvm-src/llvm/lib/ExecutionEngine/Orc/OrcV2CBindings.cpp:561 #19 0x00007f381c596612 in llvm_shutdown (code=<optimized out>, arg=140722242323824) at llvmjit.c:892 #20 0x00000000007d4385 in proc_exit_prepare (code=code@entry=0) at ipc.c:209 #21 0x00000000007d4288 in proc_exit (code=code@entry=0) at ipc.c:107 regards, tom lane
> It'd sure be nice if seawasp stopped spamming the buildfarm failure log, > too. There was a silent API breakage (same API, different behavior, how nice…) in llvm main that Andres figured out, which will have to be fixed at some point, so this is reminder that it is still a todo… Not sure when a fix is planned, though. I'm afraid portability may require that different code is executed depending on llvm version. Or maybe we should wrestle a revert on llvm side? Hmmm… So I'm not very confident that the noise will go away quickly, sorry. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> It'd sure be nice if seawasp stopped spamming the buildfarm failure log, >> too. > There was a silent API breakage (same API, different behavior, how nice…) > in llvm main that Andres figured out, which will have to be fixed at some > point, so this is reminder that it is still a todo… If it were *our* todo, that would be one thing; but it isn't. > Not sure when a fix is > planned, though. I'm afraid portability may require that different code is > executed depending on llvm version. Or maybe we should wrestle a revert on > llvm side? Hmmm… > So I'm not very confident that the noise will go away quickly, sorry. Could you please just shut down the animal until that's dealt with? It's extremely unpleasant to have to root through a lot of useless failures to find the ones that might be of interest. Right now serinus and seawasp are degrading this report nearly to uselessness: https://buildfarm.postgresql.org/cgi-bin/show_failures.pl regards, tom lane
Hello Tom, >> So I'm not very confident that the noise will go away quickly, sorry. > > Could you please just shut down the animal until that's dealt with? Hmmm… Obviously I can. However, please note that the underlying logic of "a test is failing, let's just remove it" does not sound right to me at all:-( The test is failing because there is a problem, and shuting down the test to improve a report does not in any way help to fix it, it just helps to hide it. > It's extremely unpleasant to have to root through a lot of useless > failures I do not understand how they are useless. Pg does not work properly with current LLVM, and keeps on not working. I think that this information is worthy, even if I do not like it and would certainly prefer a quick fix. > to find the ones that might be of interest. Right now serinus and > seawasp are degrading this report nearly to uselessness: > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl IMHO, the report should be improved, not the test removed. If you insist I will shut down the animal, bit I'd prefer not to. I think that the reminder has value, and just because some report is not designed to handle this nicely does not seem like a good reason to do that. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Could you please just shut down the animal until that's dealt with? > The test is failing because there is a problem, and shuting down the test > to improve a report does not in any way help to fix it, it just helps to > hide it. Our buildfarm is run for the use of the Postgres project, not the LLVM project. I'm not really happy that it contains any experimental-compiler animals at all, but as long as they're unobtrusive I can stand it. serinus and seawasp are being the opposite of unobtrusive. If you don't want to shut it down entirely, maybe backing it off to run only once a week would be an acceptable compromise. Since you only update its compiler version once a week, I doubt we learn much from runs done more often than that anyway. regards, tom lane
Hello Tom, >>> Could you please just shut down the animal until that's dealt with? > >> The test is failing because there is a problem, and shuting down the test >> to improve a report does not in any way help to fix it, it just helps to >> hide it. > > Our buildfarm is run for the use of the Postgres project, not the LLVM > project. The point of these animals is to have early warning of upcoming compiler changes. Given the release cycle of the project and the fact that a version is expected to work for 5 years, this is a clear benefit for postgres, IMO. When the compiler is broken, it is noisy, too bad. In this instance the compiler is not broken, but postgres is. If the consensus is that these animals are useless, I'll remove them, and be sad that the community is not able to see their value. > I'm not really happy that it contains any experimental-compiler > animals at all, but as long as they're unobtrusive I can stand it. > serinus and seawasp are being the opposite of unobtrusive. I think that the problem is the report, not the failing animal. In French we say "ce n’est pas en cassant le thermomètre qu’on fait tomber la fièvre", which is an equivalent of "don't shoot the messenger". > If you don't want to shut it down entirely, maybe backing it off to > run only once a week would be an acceptable compromise. Since you > only update its compiler version once a week, I doubt we learn much > from runs done more often than that anyway. Hmmm… I can slow it down. We will wait one week to learn that the problems have been fixed, wow. <Sigh>. -- Fabien.
On Sat, Jun 19, 2021 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: > >> It'd sure be nice if seawasp stopped spamming the buildfarm failure log, > >> too. > > > There was a silent API breakage (same API, different behavior, how nice…) > > in llvm main that Andres figured out, which will have to be fixed at some > > point, so this is reminder that it is still a todo… > > If it were *our* todo, that would be one thing; but it isn't. Over on the other thread[1] we learned that this is an API change affecting reference counting semantics[2], so unless there is some discussion somewhere about reverting the LLVM change that I'm unaware of, I'm guessing we're going to need to change our code sooner or later. I have a bleeding edge LLVM on my dev machine, and I'm willing to try to reproduce the crash and write the trivial patch (that is, figure out the right preprocessor incantation to detect the version or feature, and bump the reference count as appropriate), if Andres and/or Fabien aren't already on the case. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLEy8mgtN7BNp0ooFAjUedDTJj5dME7NxLU-m91b85siA%40mail.gmail.com [2] https://github.com/llvm/llvm-project/commit/c8fc5e3ba942057d6c4cdcd1faeae69a28e7b671
>>> There was a silent API breakage (same API, different behavior, how nice…) >>> in llvm main that Andres figured out, which will have to be fixed at some >>> point, so this is reminder that it is still a todo… >> >> If it were *our* todo, that would be one thing; but it isn't. > > Over on the other thread[1] we learned that this is an API change > affecting reference counting semantics[2], so unless there is some > discussion somewhere about reverting the LLVM change that I'm unaware > of, I'm guessing we're going to need to change our code sooner or > later. Indeed, I'm afraid the solution will have to be on pg side. > I have a bleeding edge LLVM on my dev machine, and I'm willing to try to > reproduce the crash and write the trivial patch (that is, figure out the > right preprocessor incantation to detect the version or feature, and > bump the reference count as appropriate), if Andres and/or Fabien aren't > already on the case. I'm not in the case, I'm only the one running the farm animal which barks too annoyingly for Tom. -- Fabien.