Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Memory leak from ExecutorState context?
Date
Msg-id 3013398b-316c-638f-2a73-3783e8e2ef02@enterprisedb.com
Whole thread Raw
In response to Re: Memory leak from ExecutorState context?  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses Re: Memory leak from ExecutorState context?  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
List pgsql-hackers
On 3/1/23 18:48, Jehan-Guillaume de Rorthais wrote:
> Hi Tomas,
> 
> On Tue, 28 Feb 2023 20:51:02 +0100
> Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>> On 2/28/23 19:06, Jehan-Guillaume de Rorthais wrote:
>>> * HashBatchContext goes up to 1441MB after 240s then stay flat until the end
>>>   (400s as the last record)  
>>
>> That's interesting. We're using HashBatchContext for very few things, so
>> what could it consume so much memory? But e.g. the number of buckets
>> should be limited by work_mem, so how could it get to 1.4GB?
>>
>> Can you break at ExecHashIncreaseNumBatches/ExecHashIncreaseNumBuckets
>> and print how many batches/butches are there?
> 
> I did this test this morning.
> 
> Batches and buckets increased really quickly to 1048576/1048576.
> 

OK. I think 1M buckets is mostly expected for work_mem=64MB. It means
buckets will use 8MB, which leaves ~56B per tuple (we're aiming for
fillfactor 1.0).

But 1M batches? I guess that could be problematic. It doesn't seem like
much, but we need 1M files on each side - 1M for the hash table, 1M for
the outer relation. That's 16MB of pointers, but the files are BufFile
and we keep 8kB buffer for each of them. That's ~16GB right there :-(

In practice it probably won't be that bad, because not all files will be
allocated/opened concurrently (especially if this is due to many tuples
having the same value). Assuming that's what's happening here, ofc.

> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands of
> calls, always short-cut'ed to 1048576, I guess because of the conditional block
> «/* safety check to avoid overflow */» appearing early in this function.
> 

Hmmm, that's a bit weird, no? I mean, the check is

  /* safety check to avoid overflow */
  if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2)))
     return;

Why would it stop at 1048576? It certainly is not higher than  INT_MAX/2
and with MaxAllocSize = ~1GB the second value should be ~33M. So what's
happening here?

> I disabled the breakpoint on ExecHashIncreaseNumBatches a few time to make the
> query run faster. Enabling it at 19.1GB of memory consumption, it stayed
> silent till the memory exhaustion (around 21 or 22GB, I don't remember exactly).
> 
> The breakpoint on ExecHashIncreaseNumBuckets triggered some times at beginning,
> and a last time close to the end of the query execution.
> 
>>> Any idea? How could I help to have a better idea if a leak is actually
>>> occurring and where exactly?
>>
>> Investigating memory leaks is tough, especially for generic memory
>> contexts like ExecutorState :-( Even more so when you can't reproduce it
>> on a machine with custom builds.
>>
>> What I'd try is this:
>>
>> 1) attach breakpoints to all returns in AllocSetAlloc(), printing the
>> pointer and size for ExecutorState context, so something like
>>
>>   break aset.c:783 if strcmp("ExecutorState",context->header.name) == 0
>>   commands
>>   print MemoryChunkGetPointer(chunk) size
>>   cont
>>   end
>>
>> 2) do the same for AllocSetFree()
>>
>> 3) Match the palloc/pfree calls (using the pointer address), to
>> determine which ones are not freed and do some stats on the size.
>> Usually there's only a couple distinct sizes that account for most of
>> the leaked memory.
> 
> So here is what I end up with this afternoon, using file, lines and macro from
> REL_11_18:
> 
>   set logging  on
>   set pagination off
>   
>   break aset.c:781 if strcmp("ExecutorState",context.name) == 0
>   commands 1
>   print (((char *)(chunk)) + sizeof(struct AllocChunkData))
>   print chunk->size
>   cont
>   end
>   
>   break aset.c:820 if strcmp("ExecutorState",context.name) == 0
>   commands 2
>   print (((char *)(chunk)) + sizeof(struct AllocChunkData))
>   print chunk->size
>   cont 
>   end
>   
>   break aset.c:979 if strcmp("ExecutorState",context.name) == 0
>   commands 3
>   print (((char *)(chunk)) + sizeof(struct AllocChunkData))
>   print chunk->size
>   cont 
>   end
>   
>   break AllocSetFree if strcmp("ExecutorState",context.name) == 0
>   commands 4 
>   print pointer
>   cont
>   end
> 
> So far, gdb had more than 3h of CPU time and is eating 2.4GB of memory. The
> backend had only 3'32" of CPU time:
> 
>    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>   2727284   2.4g  17840 R  99.0  7.7 181:25.07 gdb
>   9054688 220648 103056 t   1.3  0.7   3:32.05 postmaster
> 
> Interestingly, the RES memory of the backend did not explode yet, but VIRT is
> already high.
> 
> I suppose the query will run for some more hours, hopefully, gdb will not
> exhaust the memory in the meantime...
> 
> You'll find some intermediate stats I already collected in attachment:
> 
> * break 1, 2 and 3 are from AllocSetAlloc, break 4 is from AllocSetFree.
> * most of the non-free'd chunk are allocated since the very beginning, before
>   the 5000's allocation call for almost 1M call so far.
> * 3754 of them have a chunk->size of 0
> * it seems there's some buggy stats or data:
>        # this one actually really comes from the gdb log
>        0x38a77b8: break=3 num=191       sz=4711441762604810240 (weird sz)
>        # this one might be a bug in my script
>              0x2: break=2 num=945346    sz=2                   (weird address)
> * ignoring the weird size requested during the 191st call, the total amount
>   of non free'd memory is currently 5488MB
> 
> I couldn't print "size" as it is optimzed away, that's why I tracked
> chunk->size... Is there anything wrong with my current run and gdb log? 
> 

There's definitely something wrong. The size should not be 0, and
neither it should be > 1GB. I suspect it's because some of the variables
get optimized out, and gdb just uses some nonsense :-(

I guess you'll need to debug the individual breakpoints, and see what's
available. It probably depends on the compiler version, etc. For example
I don't see the "chunk" for breakpoint 3, but "chunk_size" works and I
can print the chunk pointer with a bit of arithmetics:

  p (block->freeptr - chunk_size)

I suppose similar gympastics could work for the other breakpoints.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Gregory Stark (as CFM)"
Date:
Subject: Re: Allow +group in pg_ident.conf
Next
From: Tomas Vondra
Date:
Subject: Re: Memory leak from ExecutorState context?