Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit) - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)
Date
Msg-id 20211118212039.GT17618@telsasoft.com
Whole thread Raw
In response to Re: terminate called after throwing an instance of 'std::bad_alloc'  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)
List pgsql-hackers
On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> Thread starting here:
> https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
> 
> On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > I'm 99% sure the "bad_alloc" is from LLVM.  It happened multiple times on
> > different servers (running a similar report) after setting jit=on during pg13
> > upgrade, and never happened since re-setting jit=off.
> 
> Since this recurred a few times recently (now running pg14.0), and I finally
> managed to get a non-truncated corefile...

I think the reason this recurred is that, since upgrading to pg14, I no longer
had your memleak patches applied.  I'd forgotten about it, but was probably
running a locally compiled postgres with your patches applied.

I should've mentioned that this crash was associated with the message from the
original problem report:

|terminate called after throwing an instance of 'std::bad_alloc'
|  what():  std::bad_alloc

The leak discussed on other threads seems fixed by your patches - I compiled
v14 and now running with no visible leaks since last week.
https://www.postgresql.org/message-id/flat/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de

As I understand it, there's still an issue with an allocation failure causing
SIGABRT rather than FATAL.

It took me several tries to get the corefile since the process is huge, caused
by the leak (and abrtd wanted to truncate it, nullifying its utility).

-rw-------. 1 postgres postgres 8.4G Nov 10 08:57 /var/lib/pgsql/14/data/core.31345

I installed more debug packages to get a fuller stacktrace.

#0  0x00007f2497880337 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f2497881a28 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#3  0x00007f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#4  0x00007f2487c666c3 in std::terminate() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#5  0x00007f2487c687d3 in __cxa_throw () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#6  0x00007f2487c686cd in operator new(unsigned long) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#7  0x00007f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, Num=<optimized out>) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
No locals.
#8  llvm::DenseMap<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> >,
llvm::DenseMapAPIntKeyInfo,llvm::detail::DenseMapPair<llvm::APInt, std::unique_ptr<llvm::ConstantInt,
std::default_delete<llvm::ConstantInt>> > >::grow (this=this@entry=0x2ff7f38, AtLeast=<optimized out>)
 
    at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
        OldNumBuckets = 33554432
        OldBuckets = 0x7f23f3e42010
#9  0x00007f2486477f29 in grow (AtLeast=<optimized out>, this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
No locals.
#10 InsertIntoBucketImpl<llvm::APInt> (TheBucket=<optimized out>, Lookup=..., Key=..., this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
        NewNumEntries = <optimized out>
        EmptyKey = <optimized out>
#11 InsertIntoBucket<llvm::APInt const&> (Key=..., TheBucket=<optimized out>, this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
No locals.
#12 FindAndConstruct (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
        TheBucket = <optimized out>
#13 operator[] (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
No locals.
#14 llvm::ConstantInt::get (Context=..., V=...) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
        pImpl = 0x2ff7eb0
#15 0x00007f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V=<optimized out>, isSigned=isSigned@entry=false) at
/usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
No locals.
#16 0x00007f248648673d in LLVMConstInt (IntTy=<optimized out>, N=<optimized out>, SignExtend=SignExtend@entry=0) at
/usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
No locals.
#17 0x00007f2488f66c18 in l_ptr_const (type=0x3000650, ptr=<optimized out>) at
../../../../src/include/jit/llvmjit_emit.h:29
        c = <optimized out>
#18 llvm_compile_expr (state=<optimized out>) at llvmjit_expr.c:246
        op = 0x1a5317690
        opcode = EEOP_OUTER_VAR
        opno = 5
        parent = <optimized out>
        funcname = 0x1a53184e8 "evalexpr_4827_151"
        context = 0x1ba79b8
        b = <optimized out>
        mod = 0x1a5513d30
        eval_fn = <optimized out>
        entry = <optimized out>
        v_state = 0x1a5ce09e0
        v_econtext = 0x1a5ce0a08
        v_isnullp = 0x1a5ce0a30
        v_tmpvaluep = 0x1a5ce0aa8
        v_tmpisnullp = 0x1a5ce0b48
        starttime = {tv_sec = 10799172, tv_nsec = 781670770}
        endtime = {tv_sec = 7077194792, tv_nsec = 0}
        __func__ = "llvm_compile_expr"
[...]



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Mixing CC and a different CLANG seems like a bad idea
Next
From: Andrew Dunstan
Date:
Subject: Re: Time to drop plpython2?