Re: BUG #16696: Backend crash in llvmjit - Mailing list pgsql-bugs
From | Dmitry Marakasov |
---|---|
Subject | Re: BUG #16696: Backend crash in llvmjit |
Date | |
Msg-id | 20201104235054.GB30304@hades.panopticon Whole thread Raw |
In response to | Re: BUG #16696: Backend crash in llvmjit (Dmitry Marakasov <amdmi3@amdmi3.ru>) |
List | pgsql-bugs |
* Dmitry Marakasov (amdmi3@amdmi3.ru) wrote: > > > > Environment details: > > > > - FreeBSD 12.1 amd64 > > > > - PostgreSQL 13.0 (built from FreeBSD ports) > > > > - llvm-10.0.1 (build from FreeBSD ports) > > > > > > My bad, it's actually llvm-9.0.1. Multiple llvm versions are installed on > > > the system, and PostgreSQL uses llvm9: > > > > > > ldd /usr/local/lib/postgresql/llvmjit.so | grep LLVM > > > libLLVM-9.so => /usr/local/llvm90/lib/libLLVM-9.so (0x800e00000) > > > > Could you try generating a backtrace after turning jit_debugging_support on? That might give a bit more information. > > > > I'll check once I'm home whether I can reproduce in my environment. > > I did some digging. First of all, I've discovered that the problem > goes away if llvm bitcode optimization is disabled (by commenting out > llvm_optimize_module call). > > I've dumped the opcode and tried compiling it back to match disassembly > of the failing function in gdb disassembly. It didn't match perfectly, > but this place looked similar: > > # %bb.84: # %op.32.inputcall > movq %rax, 5267(%r13) > movb %bl, 5275(%r13) > movb $0, 5263(%r13) > movzbl (%rax), %esi > movl __mb_sb_limit(%rip), %edi > movq _ThreadRuneLocale@GOTTPOFF(%rip), %rcx > movq %fs:0, %rdx > movq (%rdx,%rcx), %rcx > cmpl %esi, %edi > movq %rax, -96(%rbp) # 8-byte Spill > movl %edi, -72(%rbp) # 4-byte Spill > movq %rcx, -64(%rbp) # 8-byte Spill > jle .LBB1_85 > > Here's my hypothesis: > > The problem happens when boolin() function is inlined by LLVM. > The named function calls isspace() internally, which on FreeBSD is > locale-specific and involves caching some locale parameters in > thread-local variable defined as > > extern _Thread_local const _RuneLocale *_ThreadRuneLocale; > > The execution crashes on trying to access the named thread-local varible, > probably because something related to TLS is not set up properly in/for > LLVM. > > I've confirmed this hypothesis by disabling isspace() calls in boolin() > which has also fixed the problem. Long story short, I was able to mitigate the crash with the following patch: --- disable-inlining-tls-using-functions.patch begins here --- commit f703544edc406293e39b7a59a245e798d18f458e Author: Dmitry Marakasov <amdmi3@amdmi3.ru> Date: Thu Nov 5 02:56:00 2020 +0300 Do not inline functions accessing TLS in LLVM JIT diff --git src/backend/jit/llvm/llvmjit_inline.cpp src/backend/jit/llvm/llvmjit_inline.cpp index 2617a46..a063edb 100644 --- src/backend/jit/llvm/llvmjit_inline.cpp +++ src/backend/jit/llvm/llvmjit_inline.cpp @@ -608,6 +608,16 @@ function_inlinable(llvm::Function &F, if (rv->materialize()) elog(FATAL, "failed to materialize metadata"); + /* + * Don't inline functions with thread-local variables until + * related crashes are investigated (see BUG #16696) + */ + if (rv->isThreadLocal()) { + ilog(DEBUG1, "cannot inline %s due to thread-local variable %s", + F.getName().data(), rv->getName().data()); + return false; + } + /* * Never want to inline externally visible vars, cheap enough to * reference. --- disable-inlining-tls-using-functions.patch ends here --- I have no knowledge of LLVM to investigate this further, but the guess is that something TLS related is not initialized properly. -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: https://github.com/AMDmi3
pgsql-bugs by date: