Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend. - Mailing list pgsql-bugs
From | Amit Langote |
---|---|
Subject | Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend. |
Date | |
Msg-id | CA+HiwqFpMMk4djpzcfa=yv0_55UmGNiD96gS+iXMPtx=ch3R=g@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend. (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
(Amit Langote <amitlangote09@gmail.com>)
|
List | pgsql-bugs |
Reviving an old thread. On Tue, Dec 8, 2020 at 4:35 PM Andres Freund <andres@anarazel.de> wrote: > On 2020-12-07 08:58:18 +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote: > > Thanks for the comment. > > I newly cloned the latest version 13.1 source and then rebuilt it. The > > segmentation fault has recurred.I tried the backtrace feature, but it > > didn't work. > > What exactly do you mean by that? You attached to the worker with a > debugger, and you didn't get a backtrace once it crashed? Same with a > core file? Or were you hoping for the segfault to automatically > generate a backtrace? > > The easist way would be to enable core files with 'ulimit -c unlimited' > in the shell you start postgres in, and to set the > 'jit_debugging_support=1' option. Then, once the crash happend, you > should be able to find a core file. > > You then can inspect that core file with > gdb /path/to/postgres -core /path/to/core > > and execute 'bt' inside. > > > > The function specification may not have been good. > > Which function's specification? > > > > Segmentation faults do not always occur, but occasionally after running pg_cancel_backend several times. > > Also, it only seems to happen if I canceled the parallel worker process. > > I tried this a couple hundred times without success. I have happened to run into an odd JIT-related crash that may be related. It happens (almost) every time when I run make installcheck with force_parallel_mode=regress and build with LLVM 7.0. The backtrace looks like this: (gdb) bt #0 0x00007f88fe18c8cd in std::_Function_handler<void (unsigned long, llvm::object::ObjectFile const&), llvm::OrcCBindingsStack::OrcCBindingsStack(llvm::TargetMachine&, std::function<std::unique_ptr<llvm::orc::IndirectStubsManager, std::default_delete<llvm::orc::IndirectStubsManager> > ()>)::{lambda(unsigned long, llvm::object::ObjectFile const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long, llvm::object::ObjectFile const&) () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so #1 0x00007f88fe18e578 in llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject<std::shared_ptr<llvm::RuntimeDyld::MemoryManager> >::~ConcreteLinkedObject() () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so #2 0x00007f88fe18e7aa in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > >, std::_Select1st<std::pair<unsigned long const, std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > > >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > > > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, std::unique_ptr<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject, std::default_delete<llvm::orc::RTDyldObjectLinkingLayerBase::LinkedObject> > > >*) () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so #3 0x00007f88fe19ac91 in llvm::OrcCBindingsStack::~OrcCBindingsStack() () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so #4 0x00007f88fe19afaa in LLVMOrcDisposeInstance () from /opt/rh/llvm-toolset-7.0/root/usr/lib64/libLLVM-7.so #5 0x00007f8900189c21 in llvm_shutdown (code=1, arg=0) at llvmjit.c:940 #6 0x0000000000ce541e in proc_exit_prepare (code=1) at ipc.c:209 #7 0x0000000000ce5223 in proc_exit (code=1) at ipc.c:107 #8 0x0000000000bd8414 in StartBackgroundWorker () at bgworker.c:821 #9 0x0000000000bec38c in do_start_bgworker (rw=0x3380b30) at postmaster.c:5809 #10 0x0000000000becab3 in maybe_start_bgworkers () at postmaster.c:6033 #11 0x0000000000bea9fa in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5190 #12 <signal handler called> #13 0x00007f8908976b23 in __select_nocancel () from /lib64/libc.so.6 #14 0x0000000000be20de in ServerLoop () at postmaster.c:1772 #15 0x0000000000be1646 in PostmasterMain (argc=3, argv=0x33585b0) at postmaster.c:1480 #16 0x00000000009e0794 in main (argc=3, argv=0x33585b0) at main.c:197 -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
pgsql-bugs by date: