Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: Memory leak from ExecutorState context? |
Date | |
Msg-id | 20230302130838.717e888d@karst Whole thread Raw |
In response to | Re: Memory leak from ExecutorState context? (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Memory leak from ExecutorState context?
|
List | pgsql-hackers |
On Thu, 2 Mar 2023 01:30:27 +0100 Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote: > >>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands > >>> of calls, always short-cut'ed to 1048576, I guess because of the > >>> conditional block «/* safety check to avoid overflow */» appearing early > >>> in this function. > >[...] But what about this other short-cut then? > > > > /* do nothing if we've decided to shut off growth */ > > if (!hashtable->growEnabled) > > return; > > > > [...] > > > > /* > > * If we dumped out either all or none of the tuples in the table, > > * disable > > * further expansion of nbatch. This situation implies that we have > > * enough tuples of identical hashvalues to overflow spaceAllowed. > > * Increasing nbatch will not fix it since there's no way to subdivide > > * the > > * group any more finely. We have to just gut it out and hope the server > > * has enough RAM. > > */ > > if (nfreed == 0 || nfreed == ninmemory) > > { > > hashtable->growEnabled = false; > > #ifdef HJDEBUG > > printf("Hashjoin %p: disabling further increase of nbatch\n", > > hashtable); > > #endif > > } > > > > If I guess correctly, the function is not able to split the current batch, > > so it sits and hopes. This is a much better suspect and I can surely track > > this from gdb. > > Yes, this would make much more sense - it'd be consistent with the > hypothesis that this is due to number of batches exploding (it's a > protection exactly against that). > > You specifically mentioned the other check earlier, but now I realize > you've been just speculating it might be that. Yes, sorry about that, I jumped on this speculation without actually digging it much... [...] > But I have another idea - put a breakpoint on makeBufFile() which is the > bit that allocates the temp files including the 8kB buffer, and print in > what context we allocate that. I have a hunch we may be allocating it in > the ExecutorState. That'd explain all the symptoms. That what I was wondering as well yesterday night. So, on your advice, I set a breakpoint on makeBufFile: (gdb) info br Num Type Disp Enb Address What 1 breakpoint keep y 0x00000000007229df in makeBufFile bt 10 p CurrentMemoryContext.name Then, I disabled it and ran the query up to this mem usage: VIRT RES SHR S %CPU %MEM 20.1g 7.0g 88504 t 0.0 22.5 Then, I enabled the breakpoint and look at around 600 bt and context name before getting bored. They **all** looked like that: Breakpoint 1, BufFileCreateTemp (...) at buffile.c:201 201 in buffile.c #0 BufFileCreateTemp (...) buffile.c:201 #1 ExecHashJoinSaveTuple (tuple=0x1952c180, ...) nodeHashjoin.c:1238 #2 ExecHashJoinImpl (parallel=false, pstate=0x31a6418) nodeHashjoin.c:398 #3 ExecHashJoin (pstate=0x31a6418) nodeHashjoin.c:584 #4 ExecProcNodeInstr (node=<optimized out>) execProcnode.c:462 #5 ExecProcNode (node=0x31a6418) #6 ExecSort (pstate=0x31a6308) #7 ExecProcNodeInstr (node=<optimized out>) #8 ExecProcNode (node=0x31a6308) #9 fetch_input_tuple (aggstate=aggstate@entry=0x31a5ea0) $421643 = 0x99d7f7 "ExecutorState" These 600-ish 8kB buffer were all allocated in "ExecutorState". I could probably log much more of them if more checks/stats need to be collected, but it really slow down the query a lot, granting it only 1-5% of CPU time instead of the usual 100%. So It's not exactly a leakage, as memory would be released at the end of the query, but I suppose they should be allocated in a shorter living context, to avoid this memory bloat, am I right? > BTW with how many batches does the hash join start? * batches went from 32 to 1048576 before being growEnabled=false as suspected * original and current nbuckets were set to 1048576 immediately * allowed space is set to the work_mem, but current space usage is 1.3GB, as measured previously close before system refuse more memory allocation. Here are the full details about the hash associated with the previous backtrace: (gdb) up (gdb) up (gdb) p *((HashJoinState*)pstate)->hj_HashTable $421652 = { nbuckets = 1048576, log2_nbuckets = 20, nbuckets_original = 1048576, nbuckets_optimal = 1048576, log2_nbuckets_optimal = 20, buckets = {unshared = 0x68f12e8, shared = 0x68f12e8}, keepNulls = true, skewEnabled = false, skewBucket = 0x0, skewBucketLen = 0, nSkewBuckets = 0, skewBucketNums = 0x0, nbatch = 1048576, curbatch = 0, nbatch_original = 32, nbatch_outstart = 1048576, growEnabled = false, totalTuples = 19541735, partialTuples = 19541735, skewTuples = 0, innerBatchFile = 0xdfcd168, outerBatchFile = 0xe7cd1a8, outer_hashfunctions = 0x68ed3a0, inner_hashfunctions = 0x68ed3f0, hashStrict = 0x68ed440, spaceUsed = 1302386440, spaceAllowed = 67108864, spacePeak = 1302386440, spaceUsedSkew = 0, spaceAllowedSkew = 1342177, hashCxt = 0x68ed290, batchCxt = 0x68ef2a0, chunks = 0x251f28e88, current_chunk = 0x0, area = 0x0, parallel_state = 0x0, batches = 0x0, current_chunk_shared = 1103827828993 } For what it worth, contexts are: (gdb) p ((HashJoinState*)pstate)->hj_HashTable->hashCxt.name $421657 = 0x99e3c0 "HashTableContext" (gdb) p ((HashJoinState*)pstate)->hj_HashTable->batchCxt.name $421658 = 0x99e3d1 "HashBatchContext" Regards,
pgsql-hackers by date: