Re: BUG #16696: Backend crash in llvmjit - Mailing list pgsql-bugs

From Dmitry Marakasov
Subject Re: BUG #16696: Backend crash in llvmjit
Date
Msg-id 20201104212015.GA30304@hades.panopticon
Whole thread Raw
In response to Re: BUG #16696: Backend crash in llvmjit  ("Andres Freund" <andres@anarazel.de>)
Responses Re: BUG #16696: Backend crash in llvmjit
Re: BUG #16696: Backend crash in llvmjit
List pgsql-bugs
* Andres Freund (andres@anarazel.de) wrote:

> > > Environment details:
> > > - FreeBSD 12.1 amd64
> > > - PostgreSQL 13.0 (built from FreeBSD ports)
> > > - llvm-10.0.1 (build from FreeBSD ports)
> > 
> > My bad, it's actually llvm-9.0.1. Multiple llvm versions are installed on
> > the system, and PostgreSQL uses llvm9:
> > 
> > ldd /usr/local/lib/postgresql/llvmjit.so | grep LLVM
> >     libLLVM-9.so => /usr/local/llvm90/lib/libLLVM-9.so (0x800e00000)
> 
> Could you try generating a backtrace after turning jit_debugging_support on? That might give a bit more information.
> 
> I'll check once I'm home whether I can reproduce in my environment.

I did some digging. First of all, I've discovered that the problem
goes away if llvm bitcode optimization is disabled (by commenting out
llvm_optimize_module call).

I've dumped the opcode and tried compiling it back to match disassembly
of the failing function in gdb disassembly. It didn't match perfectly,
but this place looked similar:

# %bb.84:                               # %op.32.inputcall
    movq    %rax, 5267(%r13)
    movb    %bl, 5275(%r13)
    movb    $0, 5263(%r13)
    movzbl  (%rax), %esi
    movl    __mb_sb_limit(%rip), %edi
    movq    _ThreadRuneLocale@GOTTPOFF(%rip), %rcx
    movq    %fs:0, %rdx
    movq    (%rdx,%rcx), %rcx
    cmpl    %esi, %edi
    movq    %rax, -96(%rbp)         # 8-byte Spill
    movl    %edi, -72(%rbp)         # 4-byte Spill
    movq    %rcx, -64(%rbp)         # 8-byte Spill
jle     .LBB1_85

Here's my hypothesis:

The problem happens when boolin() function is inlined by LLVM.
The named function calls isspace() internally, which on FreeBSD is
locale-specific and involves caching some locale parameters in
thread-local variable defined as

extern _Thread_local const _RuneLocale *_ThreadRuneLocale;

The execution crashes on trying to access the named thread-local varible,
probably because something related to TLS is not set up properly in/for
LLVM.

I've confirmed this hypothesis by disabling isspace() calls in boolin()
which has also fixed the problem.

-- 
Dmitry Marakasov   .   55B5 0596 FF1E 8D84 5F56  9510 D35A 80DD F9D2 F77D
amdmi3@amdmi3.ru  ..:              https://github.com/AMDmi3




pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16700: Child table dependency loss after moving out of and back into the inheritance tree
Next
From: Dmitry Marakasov
Date:
Subject: Re: BUG #16696: Backend crash in llvmjit