Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Memory leak from ExecutorState context? |
Date | |
Msg-id | 41c5766d-ed71-b70c-bbbc-d3396c462d62@enterprisedb.com Whole thread Raw |
In response to | Re: Memory leak from ExecutorState context? (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>) |
Responses |
Re: Memory leak from ExecutorState context?
(Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
|
List | pgsql-hackers |
On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote: > Hi, > > On Wed, 1 Mar 2023 20:29:11 +0100 > Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> On 3/1/23 18:48, Jehan-Guillaume de Rorthais wrote: >>> On Tue, 28 Feb 2023 20:51:02 +0100 >>> Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >>>> On 2/28/23 19:06, Jehan-Guillaume de Rorthais wrote: >>>>> * HashBatchContext goes up to 1441MB after 240s then stay flat until the >>>>> end (400s as the last record) >>>> >>>> That's interesting. We're using HashBatchContext for very few things, so >>>> what could it consume so much memory? But e.g. the number of buckets >>>> should be limited by work_mem, so how could it get to 1.4GB? >>>> >>>> Can you break at ExecHashIncreaseNumBatches/ExecHashIncreaseNumBuckets >>>> and print how many batches/butches are there? >>> >>> I did this test this morning. >>> >>> Batches and buckets increased really quickly to 1048576/1048576. >> >> OK. I think 1M buckets is mostly expected for work_mem=64MB. It means >> buckets will use 8MB, which leaves ~56B per tuple (we're aiming for >> fillfactor 1.0). >> >> But 1M batches? I guess that could be problematic. It doesn't seem like >> much, but we need 1M files on each side - 1M for the hash table, 1M for >> the outer relation. That's 16MB of pointers, but the files are BufFile >> and we keep 8kB buffer for each of them. That's ~16GB right there :-( >> >> In practice it probably won't be that bad, because not all files will be >> allocated/opened concurrently (especially if this is due to many tuples >> having the same value). Assuming that's what's happening here, ofc. > > And I suppose they are close/freed concurrently as well? > Yeah. There can be different subsets of the files used, depending on when the number of batches start to explode, etc. >>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands >>> of calls, always short-cut'ed to 1048576, I guess because of the >>> conditional block «/* safety check to avoid overflow */» appearing early in >>> this function. >> >> Hmmm, that's a bit weird, no? I mean, the check is >> >> /* safety check to avoid overflow */ >> if (oldnbatch > Min(INT_MAX / 2, MaxAllocSize / (sizeof(void *) * 2))) >> return; >> >> Why would it stop at 1048576? It certainly is not higher than INT_MAX/2 >> and with MaxAllocSize = ~1GB the second value should be ~33M. So what's >> happening here? > > Indeed, not the good suspect. But what about this other short-cut then? > > /* do nothing if we've decided to shut off growth */ > if (!hashtable->growEnabled) > return; > > [...] > > /* > * If we dumped out either all or none of the tuples in the table, disable > * further expansion of nbatch. This situation implies that we have > * enough tuples of identical hashvalues to overflow spaceAllowed. > * Increasing nbatch will not fix it since there's no way to subdivide the > * group any more finely. We have to just gut it out and hope the server > * has enough RAM. > */ > if (nfreed == 0 || nfreed == ninmemory) > { > hashtable->growEnabled = false; > #ifdef HJDEBUG > printf("Hashjoin %p: disabling further increase of nbatch\n", > hashtable); > #endif > } > > If I guess correctly, the function is not able to split the current batch, so > it sits and hopes. This is a much better suspect and I can surely track this > from gdb. > Yes, this would make much more sense - it'd be consistent with the hypothesis that this is due to number of batches exploding (it's a protection exactly against that). You specifically mentioned the other check earlier, but now I realize you've been just speculating it might be that. > Being able to find what are the fields involved in the join could help as well > to check or gather some stats about them, but I hadn't time to dig this yet... > It's going to be tricky, because all parts of the plan may be doing something, and there may be multiple hash joins. So you won't know if you're executing the part of the plan that's causing issues :-( But I have another idea - put a breakpoint on makeBufFile() which is the bit that allocates the temp files including the 8kB buffer, and print in what context we allocate that. I have a hunch we may be allocating it in the ExecutorState. That'd explain all the symptoms. BTW with how many batches does the hash join start? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: