Thread: PG11 jit failing on ppc64el

PG11 jit failing on ppc64el

From
Christoph Berg
Date:
PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except
Jessie are affected. My guess is on problems with llvm/jit, because of
the C++ style error message (and LLVM is disabled on Jessie).

Debian sid:
15:59:29 2018-05-22 13:59:24.914 UTC [29081] pg_regress/strings STATEMENT:  SELECT chr(0);
15:59:29 terminate called after throwing an instance of 'std::bad_function_call'
15:59:29   what():  bad_function_call
15:59:29 2018-05-22 13:59:25.026 UTC [28961] LOG:  server process (PID 29085) was terminated by signal 6: Aborted
15:59:29 2018-05-22 13:59:25.026 UTC [28961] DETAIL:  Failed process was running: INSERT INTO TEMP_GROUP
15:59:29       SELECT 1, (- i.f1), (- f.f1)
15:59:29       FROM INT4_TBL i, FLOAT8_TBL f;
15:59:29 2018-05-22 13:59:25.026 UTC [28961] LOG:  terminating any other active server processes
15:59:29 2018-05-22 13:59:25.026 UTC [29078] WARNING:  terminating connection because of crash of another server
process

Debian stretch:
15:58:45 2018-05-22 13:58:43.778 UTC [29981] pg_regress/indexing STATEMENT:  insert into fastpath values (1, 'b1',
100.00);
15:58:45 terminate called after throwing an instance of 'std::bad_function_call'
15:58:45   what():  bad_function_call
15:58:45 2018-05-22 13:58:43.975 UTC [28908] LOG:  server process (PID 29981) was terminated by signal 6: Aborted
15:58:45 2018-05-22 13:58:43.975 UTC [28908] DETAIL:  Failed process was running: select md5(string_agg(a::text, b
orderby a, b asc)) from fastpath
 
15:58:45         where a >= 1000 and a < 2000 and b > 'b1' and b < 'b3';
15:58:45 2018-05-22 13:58:43.975 UTC [28908] LOG:  terminating any other active server processes
15:58:45 2018-05-22 13:58:43.975 UTC [30037] WARNING:  terminating connection because of crash of another server
process

Christoph


Re: PG11 jit failing on ppc64el

From
Andres Freund
Date:
Hi,

On 2018-05-22 16:33:57 +0200, Christoph Berg wrote:
> PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except
> Jessie are affected. My guess is on problems with llvm/jit, because of
> the C++ style error message (and LLVM is disabled on Jessie).

It was bug in LLVM that's fixed now. I guess you can either disable jit
on arm or ask the LLVM maintainer to backport it...

r328687 - but the expanded tests created a few problems (windows mainly,
but somewhere else too), so I'd just backport the actual code change.

- Andres


Re: PG11 jit failing on ppc64el

From
Christoph Berg
Date:
Re: Andres Freund 2018-05-22 <20180522151101.drsbh6p7ltxpmn65@alap3.anarazel.de>
> Hi,
> 
> On 2018-05-22 16:33:57 +0200, Christoph Berg wrote:
> > PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases except
> > Jessie are affected. My guess is on problems with llvm/jit, because of
> > the C++ style error message (and LLVM is disabled on Jessie).
> 
> It was bug in LLVM that's fixed now. I guess you can either disable jit
> on arm or ask the LLVM maintainer to backport it...
> 
> r328687 - but the expanded tests created a few problems (windows mainly,
> but somewhere else too), so I'd just backport the actual code change.

Thanks also for the extra details on IRC.

I've disabled --with-llvm on all platforms except amd64 i386 now. Will
try talking to the llvm maintainers in Debian to see if we can get
this fixed and have more coverage.

Christoph


Re: PG11 jit failing on ppc64el

From
Andres Freund
Date:

On May 23, 2018 4:59:00 AM PDT, Christoph Berg <myon@debian.org> wrote:
>Re: Andres Freund 2018-05-22
><20180522151101.drsbh6p7ltxpmn65@alap3.anarazel.de>
>> Hi,
>>
>> On 2018-05-22 16:33:57 +0200, Christoph Berg wrote:
>> > PG 11 beta1 is failing on ppc64el. All Debian/Ubuntu releases
>except
>> > Jessie are affected. My guess is on problems with llvm/jit, because
>of
>> > the C++ style error message (and LLVM is disabled on Jessie).
>>
>> It was bug in LLVM that's fixed now. I guess you can either disable
>jit
>> on arm or ask the LLVM maintainer to backport it...
>>
>> r328687 - but the expanded tests created a few problems (windows
>mainly,
>> but somewhere else too), so I'd just backport the actual code change.
>
>Thanks also for the extra details on IRC.
>
>I've disabled --with-llvm on all platforms except amd64 i386 now. Will
>try talking to the llvm maintainers in Debian to see if we can get
>this fixed and have more coverage.

How about making that dependant on the llvm version being < 7?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: PG11 jit failing on ppc64el

From
Christoph Berg
Date:
Re: Andres Freund 2018-05-23 <38F42310-62AC-48B1-8A83-639B97E5FA81@anarazel.de>
> >I've disabled --with-llvm on all platforms except amd64 i386 now. Will
> >try talking to the llvm maintainers in Debian to see if we can get
> >this fixed and have more coverage.
> 
> How about making that dependant on the llvm version being < 7?

It does work on x86 for <7, so the architecture would still need to be
coded into debian/control and/or debian/rules. Also, we can't depend
on "llvm (>= 7 if it exists)"...

Christoph


Re: PG11 jit failing on ppc64el

From
Andres Freund
Date:
On 2018-05-23 22:45:26 +0200, Christoph Berg wrote:
> Re: Andres Freund 2018-05-23 <38F42310-62AC-48B1-8A83-639B97E5FA81@anarazel.de>
> > >I've disabled --with-llvm on all platforms except amd64 i386 now. Will
> > >try talking to the llvm maintainers in Debian to see if we can get
> > >this fixed and have more coverage.
> > 
> > How about making that dependant on the llvm version being < 7?
> 
> It does work on x86 for <7, so the architecture would still need to be
> coded into debian/control and/or debian/rules. Also, we can't depend
> on "llvm (>= 7 if it exists)"...

What I meant was that I'd conditionally enable it for the other archs
when the version is >= 7.

Greetings,

Andres Freund


Re: PG11 jit failing on ppc64el

From
Christoph Berg
Date:
Re: Andres Freund 2018-05-23 <20180523205521.mdzwldqabriupiz5@alap3.anarazel.de>
> What I meant was that I'd conditionally enable it for the other archs
> when the version is >= 7.

Good idea, but unfortunately there's a bunch of architectures on
ports.debian.org that llvm hasn't been ported to yet :(, so the
architecture qualification on the dependencies is still necessary.

Christoph


Re: PG11 jit failing on ppc64el

From
Thomas Munro
Date:
On Thu, May 24, 2018 at 9:00 AM, Christoph Berg <myon@debian.org> wrote:
> Re: Andres Freund 2018-05-23 <20180523205521.mdzwldqabriupiz5@alap3.anarazel.de>
>> What I meant was that I'd conditionally enable it for the other archs
>> when the version is >= 7.
>
> Good idea, but unfortunately there's a bunch of architectures on
> ports.debian.org that llvm hasn't been ported to yet :(, so the
> architecture qualification on the dependencies is still necessary.

BTW It is working on arm64 too, starting with LLVM 6.  5 crashed the
same way as it does on ppc.  See build farm member eelpout which is
running Debian.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: PG11 jit failing on ppc64el

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> BTW It is working on arm64 too, starting with LLVM 6.  5 crashed the
> same way as it does on ppc.  See build farm member eelpout which is
> running Debian.

For entertainment's sake, I tried building --with-llvm on FreeBSD 12
arm64 (hey, gotta do something with this raspberry pi toy I got).
I used llvm-devel-7.0.d20180327 which seems to be the latest available in
FreeBSD's package system.  Builds cleanly, does not work at all.
SIGSEGV here:

#0  __clear_cache (start=0x4c055000, end=0x4c0566ec)
    at /usr/src/contrib/compiler-rt/lib/builtins/clear_cache.c:168
#1  0x000000004bb78d8c in llvm::sys::Memory::protectMappedMemory(llvm::sys::MemoryBlock const&, unsigned int) ()
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#2  0x000000004b68f020 in
llvm::SectionMemoryManager::applyMemoryGroupPermissions(llvm::SectionMemoryManager::MemoryGroup&,unsigned int) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#3  0x000000004b68ef38 in llvm::SectionMemoryManager::finalizeMemory(std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> >*) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#4  0x000000004b85d310 in llvm::RuntimeDyld::finalizeWithMemoryManagerLocking()
    () from /home/tgl/installdir/lib/postgresql/llvmjit.so
#5  0x000000004ad22c38 in llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObj
ect<std::__1::shared_ptr<llvm::RuntimeDyld::MemoryManager> >::finalize() ()
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#6  0x000000004ad236ec in
llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::__1::shared_ptr<llvm::RuntimeDyld::MemoryManager>
>::getSymbolMaterializer(std::__1::basic_string<char,std::__1::char_traits<char>, std::__1::allocator<char>
>)::{lambda()#1}::operator()()const () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#7  0x000000004ad22084 in llvm::JITSymbol::getAddress() ()
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#8  0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#9  0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#10 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#11 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#12 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#13 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#14 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#15 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#16 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#17 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#18 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#19 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#20 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#21 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#22 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#23 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#24 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#25 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#26 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#27 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#28 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#29 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
#30 0x000000004ad1dbec in llvm::OrcCBindingsStack::findSymbolAddress(unsigned long&, std::__1::basic_string<char,
std::__1::char_traits<char>,std::__1::allocator<char> > const&, bool) () 
   from /home/tgl/installdir/lib/postgresql/llvmjit.so
... etc etc ...

Sure looks like infinite recursion in findSymbolAddress.  Thoughts?

            regards, tom lane


Re: PG11 jit failing on ppc64el

From
Thomas Munro
Date:
On Thu, May 24, 2018 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> BTW It is working on arm64 too, starting with LLVM 6.  5 crashed the
>> same way as it does on ppc.  See build farm member eelpout which is
>> running Debian.
>
> For entertainment's sake, I tried building --with-llvm on FreeBSD 12
> arm64 (hey, gotta do something with this raspberry pi toy I got).

Neat.  Quite tempted to get one!

> I used llvm-devel-7.0.d20180327 which seems to be the latest available in
> FreeBSD's package system.  Builds cleanly, does not work at all.
> SIGSEGV here:
>
> [big ugly stack]
>
> Sure looks like infinite recursion in findSymbolAddress.  Thoughts?

Hmm.  I just tried llvm-devel-7.0.d20180327 on my amd64 FreeBSD 12
system and our make check passed with flying colours.  I guess there
could be a bug in LLVM or the FreeBSD 12 linker or their interaction
on ARM.  Maybe the cycle somehow comes from lines 376 and 391 of this:


https://github.com/llvm-mirror/llvm/blob/41d411071aefb16379415150d970171698b13ff9/lib/ExecutionEngine/Orc/OrcCBindingsStack.h

I know that LocalIndirectStubsManager is instantiated differently on
each architecture, but I couldn't immediately see how that could
produce the cycle and I'm currently avoiding the LLVM-internals rabbit
hole.  Maybe Andres has an idea?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: PG11 jit failing on ppc64el

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Thu, May 24, 2018 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> For entertainment's sake, I tried building --with-llvm on FreeBSD 12
>> arm64 (hey, gotta do something with this raspberry pi toy I got).

> Neat.  Quite tempted to get one!

Fair warning: the newest "3B+" model contains new ethernet and wireless
chips that none of the BSDen have drivers for yet.  I ended up spending
an extra $10 on a USB WiFi dongle so that I could get the thing on the
network.  Compared to the price of the RPI itself, seems like highway
robbery.  (But then again, the keyboard I plugged into it is worth
more than the RPI, not to mention the monitor.)

>> I used llvm-devel-7.0.d20180327 which seems to be the latest available in
>> FreeBSD's package system.  Builds cleanly, does not work at all.

> Hmm.  I just tried llvm-devel-7.0.d20180327 on my amd64 FreeBSD 12
> system and our make check passed with flying colours.

Hmph.  Looking closer, "does not work at all" is overly negative.
It gets about halfway through the core regression tests and then
crashes on one specific query in the "inherit" test:

2018-05-24 01:22:28.657 EDT [51790] LOG:  server process (PID 52037) was terminated by signal 11: Segmentation fault
2018-05-24 01:22:28.657 EDT [51790] DETAIL:  Failed process was running: select * from matest0 order by 1-id;

Still trying to get more info on exactly where it's going off the
rails --- gdb has got some problems with printing such deep stacks,
and for some reason "ulimit -s" doesn't work to make the available
stack space smaller.

            regards, tom lane