Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Out of Memory errors are frustrating as heck!
Date
Msg-id CAMkU=1yR9n+EWJx9zY1FJcUdw0FEYvGjhZTY_6n=YzB7ALpNuw@mail.gmail.com
Whole thread Raw
In response to Re: Out of Memory errors are frustrating as heck!  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance


On Mon, Apr 15, 2019 at 11:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> To get it to happen faster, maybe you could run the server with a small
> setting of "ulimit -v"?  Or, you could try to capture it live in gdb.
> Unfortunately I don't know how to set a breakpoint for allocations into a
> specific context, and setting a breakpoint for any memory allocation is
> probably going to fire too often to be useful.

If you can use gdb at all, it's not that hard to break on allocations
into a specific context; I've done it many times.  The strategy is
basically

1. Let query run long enough for memory usage to start increasing,
then attach to backend with gdb.

2. Set breakpoint at, probably, AllocSetAlloc.  (In some cases,
reallocs could be the problem, but I doubt it here.)  Then "c".

3. When it stops, "p *context" and see if this is the context
you're looking for.  In this case, since we want to know about
allocations into ExecutorState and we know there's only one
active one, you just have to look at the context name.  In general
you might have to look at the backtrace.  Anyway, if it isn't the
one you want, just "c" until you get to an allocation into the
one you do want.

4. Once you have found out the address of the context you care
about, make the breakpoint conditional on the context argument
being that one.  It might look like this:

Breakpoint 1, AllocSetAlloc (context=0x1483be0, size=480) at aset.c:715
715     {
(gdb) p *context
$1 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false,
  methods = 0xa33f40, parent = 0x0, firstchild = 0x1537f30, prevchild = 0x0,
  nextchild = 0x0, name = 0xa3483f "TopMemoryContext", ident = 0x0,
  reset_cbs = 0x0}
(gdb) cond 1  context == 0x1483be0

5. Now repeatedly "c", and check the stack trace each time, for a
dozen or two times to get a feeling for where the allocations are
being requested.

In some cases you might be able to find the context address in a
more efficient way than what I suggest in #3 --- for instance,
you could instead set a breakpoint where the context is created
and snag its address immediately, or you could dig around in
backend data structures to find it.  But these ways generally
require more familiarity with the code than just watching the
requests go by.


Thanks for the recipe.  I can use gdb at all, just not very skillfully :)
 
With that as a starting point, experimentally, this seems to work to short circuit the loop described in your step 3 (which I fear could be thousands of iterations in some situations):

cond 1 strcmp(context.name,"ExecutorState")==0

Also, I've found that in the last few versions of PostgreSQL, processes might get unreasonable numbers of SIGUSR1 (maybe due to parallelization?) and so to avoid having to stand on the 'c' button, you might need this: 

handle SIGUSR1 noprint nostop
 
Cheers,

Jeff

pgsql-performance by date:

Previous
From: Gunther
Date:
Subject: Re: Out of Memory errors are frustrating as heck!
Next
From: Tom Lane
Date:
Subject: Re: Out of Memory errors are frustrating as heck!