Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit) - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)
Date
Msg-id 20220106170833.GA7796@telsasoft.com
Whole thread Raw
In response to Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
There's no leak after running for ~5 weeks.

$ ps -O lstart,vsize,rss 17930
  PID                  STARTED    VSZ   RSS S TTY          TIME COMMAND
17930 Tue Nov 30 15:35:26 2021 1019464 117424 S ?      7-04:54:03 postgres: telsasoft ts 192.168.122.13(57640) idle

Unless you suggest otherwise , I'm planning to restart the DB soon and go back
to running the pgdg rpm binaries with jit=off rather than what I compiled and
patched locally.

On Thu, Nov 18, 2021 at 03:20:39PM -0600, Justin Pryzby wrote:
> On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> > Thread starting here:
> > https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
> > 
> > On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > > I'm 99% sure the "bad_alloc" is from LLVM.  It happened multiple times on
> > > different servers (running a similar report) after setting jit=on during pg13
> > > upgrade, and never happened since re-setting jit=off.
> > 
> > Since this recurred a few times recently (now running pg14.0), and I finally
> > managed to get a non-truncated corefile...
> 
> I think the reason this recurred is that, since upgrading to pg14, I no longer
> had your memleak patches applied.  I'd forgotten about it, but was probably
> running a locally compiled postgres with your patches applied.
> 
> I should've mentioned that this crash was associated with the message from the
> original problem report:
> 
> |terminate called after throwing an instance of 'std::bad_alloc'
> |  what():  std::bad_alloc
> 
> The leak discussed on other threads seems fixed by your patches - I compiled
> v14 and now running with no visible leaks since last week.
> https://www.postgresql.org/message-id/flat/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de
> 
> As I understand it, there's still an issue with an allocation failure causing
> SIGABRT rather than FATAL.
> 
> It took me several tries to get the corefile since the process is huge, caused
> by the leak (and abrtd wanted to truncate it, nullifying its utility).
> 
> -rw-------. 1 postgres postgres 8.4G Nov 10 08:57 /var/lib/pgsql/14/data/core.31345
> 
> I installed more debug packages to get a fuller stacktrace.
> 
> #0  0x00007f2497880337 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f2497881a28 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2  0x00007f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #3  0x00007f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #4  0x00007f2487c666c3 in std::terminate() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #5  0x00007f2487c687d3 in __cxa_throw () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #6  0x00007f2487c686cd in operator new(unsigned long) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #7  0x00007f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, Num=<optimized out>) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
> No locals.
> #8  llvm::DenseMap<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> >,
llvm::DenseMapAPIntKeyInfo,llvm::detail::DenseMapPair<llvm::APInt, std::unique_ptr<llvm::ConstantInt,
std::default_delete<llvm::ConstantInt>> > >::grow (this=this@entry=0x2ff7f38, AtLeast=<optimized out>)
 
>     at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
>         OldNumBuckets = 33554432
>         OldBuckets = 0x7f23f3e42010
> #9  0x00007f2486477f29 in grow (AtLeast=<optimized out>, this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
> No locals.
> #10 InsertIntoBucketImpl<llvm::APInt> (TheBucket=<optimized out>, Lookup=..., Key=..., this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
>         NewNumEntries = <optimized out>
>         EmptyKey = <optimized out>
> #11 InsertIntoBucket<llvm::APInt const&> (Key=..., TheBucket=<optimized out>, this=0x2ff7f38) at
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
> No locals.
> #12 FindAndConstruct (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
>         TheBucket = <optimized out>
> #13 operator[] (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
> No locals.
> #14 llvm::ConstantInt::get (Context=..., V=...) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
>         pImpl = 0x2ff7eb0
> #15 0x00007f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V=<optimized out>, isSigned=isSigned@entry=false) at
/usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
> No locals.
> #16 0x00007f248648673d in LLVMConstInt (IntTy=<optimized out>, N=<optimized out>, SignExtend=SignExtend@entry=0) at
/usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
> No locals.
> #17 0x00007f2488f66c18 in l_ptr_const (type=0x3000650, ptr=<optimized out>) at
../../../../src/include/jit/llvmjit_emit.h:29
>         c = <optimized out>
> #18 llvm_compile_expr (state=<optimized out>) at llvmjit_expr.c:246
>         op = 0x1a5317690
>         opcode = EEOP_OUTER_VAR
>         opno = 5
>         parent = <optimized out>
>         funcname = 0x1a53184e8 "evalexpr_4827_151"
>         context = 0x1ba79b8
>         b = <optimized out>
>         mod = 0x1a5513d30
>         eval_fn = <optimized out>
>         entry = <optimized out>
>         v_state = 0x1a5ce09e0
>         v_econtext = 0x1a5ce0a08
>         v_isnullp = 0x1a5ce0a30
>         v_tmpvaluep = 0x1a5ce0aa8
>         v_tmpisnullp = 0x1a5ce0b48
>         starttime = {tv_sec = 10799172, tv_nsec = 781670770}
>         endtime = {tv_sec = 7077194792, tv_nsec = 0}
>         __func__ = "llvm_compile_expr"
> [...]



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references
Next
From: "Joel Jacobson"
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references