Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend. - Mailing list pgsql-bugs

From Amit Langote
Subject Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
Date
Msg-id CA+HiwqFpMMk4djpzcfa=yv0_55UmGNiD96gS+iXMPtx=ch3R=g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-bugs
Reviving an old thread.

On Tue, Dec 8, 2020 at 4:35 PM Andres Freund <andres@anarazel.de> wrote:
> On 2020-12-07 08:58:18 +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
> > Thanks for the comment.
> > I newly cloned the latest version 13.1 source and then rebuilt it. The
> > segmentation fault has recurred.I tried the backtrace feature, but it
> > didn't work.
>
> What exactly do you mean by that? You attached to the worker with a
> debugger, and you didn't get a backtrace once it crashed? Same with a
> core file?  Or were you hoping for the segfault to automatically
> generate a backtrace?
>
> The easist way would be to enable core files with 'ulimit -c unlimited'
> in the shell you start postgres in, and to set the
> 'jit_debugging_support=1' option. Then, once the crash happend, you
> should be able to find a core file.
>
> You then can inspect that core file with
> gdb /path/to/postgres -core /path/to/core
>
> and execute 'bt' inside.
>
>
> > The function specification may not have been good.
>
> Which function's specification?
>
>
> > Segmentation faults do not always occur, but occasionally after running pg_cancel_backend several times.
> > Also, it only seems to happen if I canceled the parallel worker process.
>
> I tried this a couple hundred times without success.

I have happened to run into an odd JIT-related crash that may be related.

It happens (almost) every time when I run make installcheck with
force_parallel_mode=regress and build with LLVM 7.0.

The backtrace looks like this:

(gdb) bt
#0  0x00007f88fe18c8cd in std::_Function_handler<void (unsigned long,
llvm::object::ObjectFile const&),
llvm::OrcCBindingsStack::OrcCBindingsStack(llvm::TargetMachine&,
std::function<std::unique_ptr<llvm::orc::IndirectStubsManager,
std::default_delete<llvm::orc::IndirectStubsManager> >
()>)::{lambda(unsigned long, llvm::object::ObjectFile
const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long,
llvm::object::ObjectFile const&) ()
   from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#1  0x00007f88fe18e578 in
llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::shared_ptr<llvm::RuntimeDyld::MemoryManager>
>::~ConcreteLinkedObject() () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#2  0x00007f88fe18e7aa in std::_Rb_tree<unsigned long,
std::pair<unsigned long const,
std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject,
std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject>
> >, std::_Select1st<std::pair<unsigned long const,
std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject,
std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject>
> > >, std::less<unsigned long>, std::allocator<std::pair<unsigned
long const, std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject,
std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject>
> > > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const,
std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject,
std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject>
> > >*) ()
   from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#3  0x00007f88fe19ac91 in llvm::OrcCBindingsStack::~OrcCBindingsStack() ()
   from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#4  0x00007f88fe19afaa in LLVMOrcDisposeInstance () from
/opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so
#5  0x00007f8900189c21 in llvm_shutdown (code=1, arg=0) at llvmjit.c:940
#6  0x0000000000ce541e in proc_exit_prepare (code=1) at ipc.c:209
#7  0x0000000000ce5223 in proc_exit (code=1) at ipc.c:107
#8  0x0000000000bd8414 in StartBackgroundWorker () at bgworker.c:821
#9  0x0000000000bec38c in do_start_bgworker (rw=0x3380b30) at postmaster.c:5809
#10 0x0000000000becab3 in maybe_start_bgworkers () at postmaster.c:6033
#11 0x0000000000bea9fa in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5190
#12 <signal handler called>
#13 0x00007f8908976b23 in __select_nocancel () from /lib64/libc.so.6
#14 0x0000000000be20de in ServerLoop () at postmaster.c:1772
#15 0x0000000000be1646 in PostmasterMain (argc=3, argv=0x33585b0) at
postmaster.c:1480
#16 0x00000000009e0794 in main (argc=3, argv=0x33585b0) at main.c:197

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands
Next
From: Amit Langote
Date:
Subject: Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.