Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: Memory leak from ExecutorState context?
Date
Msg-id 20230302130838.717e888d@karst
Whole thread Raw
In response to Re: Memory leak from ExecutorState context?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Memory leak from ExecutorState context?
List pgsql-hackers
On Thu, 2 Mar 2023 01:30:27 +0100
Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote:
> >>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands
> >>> of calls, always short-cut'ed to 1048576, I guess because of the
> >>> conditional block «/* safety check to avoid overflow */» appearing early
> >>> in this function.
> >[...] But what about this other short-cut then?
> >
> >     /* do nothing if we've decided to shut off growth */
> >     if (!hashtable->growEnabled)
> >         return;
> >
> >     [...]
> >
> >     /*
> >      * If we dumped out either all or none of the tuples in the table,
> >      * disable
> >      * further expansion of nbatch.  This situation implies that we have
> >      * enough tuples of identical hashvalues to overflow spaceAllowed.
> >      * Increasing nbatch will not fix it since there's no way to subdivide
> >      * the
> >      * group any more finely. We have to just gut it out and hope the server
> >      * has enough RAM.
> >      */
> >     if (nfreed == 0 || nfreed == ninmemory)
> >     {
> >         hashtable->growEnabled = false;
> >   #ifdef HJDEBUG
> >         printf("Hashjoin %p: disabling further increase of nbatch\n",
> >                hashtable);
> >   #endif
> >     }
> >
> > If I guess correctly, the function is not able to split the current batch,
> > so it sits and hopes. This is a much better suspect and I can surely track
> > this from gdb.
>
> Yes, this would make much more sense - it'd be consistent with the
> hypothesis that this is due to number of batches exploding (it's a
> protection exactly against that).
>
> You specifically mentioned the other check earlier, but now I realize
> you've been just speculating it might be that.

Yes, sorry about that, I jumped on this speculation without actually digging it
much...

[...]
> But I have another idea - put a breakpoint on makeBufFile() which is the
> bit that allocates the temp files including the 8kB buffer, and print in
> what context we allocate that. I have a hunch we may be allocating it in
> the ExecutorState. That'd explain all the symptoms.

That what I was wondering as well yesterday night.

So, on your advice, I set a breakpoint on makeBufFile:

  (gdb) info br
  Num     Type           Disp Enb Address            What
  1       breakpoint     keep y   0x00000000007229df in makeBufFile
          bt 10
          p CurrentMemoryContext.name


Then, I disabled it and ran the query up to this mem usage:

   VIRT    RES    SHR S  %CPU %MEM
  20.1g   7.0g  88504 t   0.0 22.5

Then, I enabled the breakpoint and look at around 600 bt and context name
before getting bored. They **all** looked like that:

  Breakpoint 1, BufFileCreateTemp (...)                at buffile.c:201
  201     in buffile.c
  #0  BufFileCreateTemp (...)                             buffile.c:201
  #1  ExecHashJoinSaveTuple (tuple=0x1952c180, ...)       nodeHashjoin.c:1238
  #2  ExecHashJoinImpl (parallel=false, pstate=0x31a6418) nodeHashjoin.c:398
  #3  ExecHashJoin (pstate=0x31a6418)                     nodeHashjoin.c:584
  #4  ExecProcNodeInstr (node=<optimized out>)            execProcnode.c:462
  #5  ExecProcNode (node=0x31a6418)
  #6  ExecSort (pstate=0x31a6308)
  #7  ExecProcNodeInstr (node=<optimized out>)
  #8  ExecProcNode (node=0x31a6308)
  #9  fetch_input_tuple (aggstate=aggstate@entry=0x31a5ea0)

  $421643 = 0x99d7f7 "ExecutorState"

These 600-ish 8kB buffer were all allocated in "ExecutorState". I could
probably log much more of them if more checks/stats need to be collected, but
it really slow down the query a lot, granting it only 1-5% of CPU time instead
of the usual 100%.

So It's not exactly a leakage, as memory would be released at the end of the
query, but I suppose they should be allocated in a shorter living context,
to avoid this memory bloat, am I right?

> BTW with how many batches does the hash join start?

* batches went from 32 to 1048576 before being growEnabled=false as suspected
* original and current nbuckets were set to 1048576 immediately
* allowed space is set to the work_mem, but current space usage is 1.3GB, as
  measured previously close before system refuse more memory allocation.

Here are the full details about the hash associated with the previous backtrace:

  (gdb) up
  (gdb) up
  (gdb) p *((HashJoinState*)pstate)->hj_HashTable
  $421652 = {
    nbuckets = 1048576,
    log2_nbuckets = 20,
    nbuckets_original = 1048576,
    nbuckets_optimal = 1048576,
    log2_nbuckets_optimal = 20,
    buckets = {unshared = 0x68f12e8, shared = 0x68f12e8},
    keepNulls = true,
    skewEnabled = false,
    skewBucket = 0x0,
    skewBucketLen = 0,
    nSkewBuckets = 0,
    skewBucketNums = 0x0,
    nbatch = 1048576,
    curbatch = 0,
    nbatch_original = 32,
    nbatch_outstart = 1048576,
    growEnabled = false,
    totalTuples = 19541735,
    partialTuples = 19541735,
    skewTuples = 0,
    innerBatchFile = 0xdfcd168,
    outerBatchFile = 0xe7cd1a8,
    outer_hashfunctions = 0x68ed3a0,
    inner_hashfunctions = 0x68ed3f0,
    hashStrict = 0x68ed440,
    spaceUsed = 1302386440,
    spaceAllowed = 67108864,
    spacePeak = 1302386440,
    spaceUsedSkew = 0,
    spaceAllowedSkew = 1342177,
    hashCxt = 0x68ed290,
    batchCxt = 0x68ef2a0,
    chunks = 0x251f28e88,
    current_chunk = 0x0,
    area = 0x0,
    parallel_state = 0x0,
    batches = 0x0,
    current_chunk_shared = 1103827828993
  }

For what it worth, contexts are:

  (gdb) p ((HashJoinState*)pstate)->hj_HashTable->hashCxt.name
  $421657 = 0x99e3c0 "HashTableContext"

  (gdb) p ((HashJoinState*)pstate)->hj_HashTable->batchCxt.name
  $421658 = 0x99e3d1 "HashBatchContext"

Regards,



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Make some xlogreader messages more accurate
Next
From: Amit Kapila
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher