Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: Memory leak from ExecutorState context? |
Date | |
Msg-id | 20230328151745.0f6061f8@karst Whole thread Raw |
In response to | Re: Memory leak from ExecutorState context? (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Memory leak from ExecutorState context?
|
List | pgsql-hackers |
On Tue, 28 Mar 2023 00:43:34 +0200 Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 3/27/23 23:13, Jehan-Guillaume de Rorthais wrote: > > Please, find in attachment a patch to allocate bufFiles in a dedicated > > context. I picked up your patch, backpatch'd it, went through it and did > > some minor changes to it. I have some comment/questions thought. > > > > 1. I'm not sure why we must allocate the "HashBatchFiles" new context > > under ExecutorState and not under hashtable->hashCxt? > > > > The only references I could find was in hashjoin.h:30: > > > > /* [...] > > * [...] (Exception: data associated with the temp files lives in the > > * per-query context too, since we always call buffile.c in that > > context.) > > > > And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this > > original comment in the patch): > > > > /* [...] > > * Note: it is important always to call this in the regular executor > > * context, not in a shorter-lived context; else the temp file buffers > > * will get messed up. > > > > > > But these are not explanation of why BufFile related allocations must be > > under a per-query context. > > > > Doesn't that simply describe the current (unpatched) behavior where > BufFile is allocated in the per-query context? I wasn't sure. The first quote from hashjoin.h seems to describe a stronger rule about «**always** call buffile.c in per-query context». But maybe it ought to be «always call buffile.c from one of the sub-query context»? I assume the aim is to enforce the tmp files removal on query end/error? > I mean, the current code calls BufFileCreateTemp() without switching the > context, so it's in the ExecutorState. But with the patch it very clearly is > not. > > And I'm pretty sure the patch should do > > hashtable->fileCxt = AllocSetContextCreate(hashtable->hashCxt, > "HashBatchFiles", > ALLOCSET_DEFAULT_SIZES); > > and it'd still work. Or why do you think we *must* allocate it under > ExecutorState? That was actually my very first patch and it indeed worked. But I was confused about the previous quoted code comments. That's why I kept your original code and decided to rise the discussion here. Fixed in new patch in attachment. > FWIW The comment in hashjoin.h needs updating to reflect the change. Done in the last patch. Is my rewording accurate? > > 2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context > > switch seems fragile as it could be forgotten in futur code path/changes. > > So I added an Assert() in the function to make sure the current memory > > context is "HashBatchFiles" as expected. > > Another way to tie this up might be to pass the memory context as > > argument to the function. > > ... Or maybe I'm over precautionary. > > > > I'm not sure I'd call that fragile, we have plenty other code that > expects the memory context to be set correctly. Not sure about the > assert, but we don't have similar asserts anywhere else. I mostly sticked it there to stimulate the discussion around this as I needed to scratch that itch. > But I think it's just ugly and overly verbose +1 Your patch was just a demo/debug patch by the time. It needed some cleanup now :) > it'd be much nicer to e.g. pass the memory context as a parameter, and do > the switch inside. That was a proposition in my previous mail, so I did it in the new patch. Let's see what other reviewers think. > > 3. You wrote: > > > >>> A separate BufFile memory context helps, although people won't see it > >>> unless they attach a debugger, I think. Better than nothing, but I was > >>> wondering if we could maybe print some warnings when the number of batch > >>> files gets too high ... > > > > So I added a WARNING when batches memory are exhausting the memory size > > allowed. > > > > + if (hashtable->fileCxt->mem_allocated > hashtable->spaceAllowed) > > + elog(WARNING, "Growing number of hash batch is exhausting > > memory"); > > > > This is repeated on each call of ExecHashIncreaseNumBatches when BufFile > > overflows the memory budget. I realize now I should probably add the > > memory limit, the number of current batch and their memory consumption. > > The message is probably too cryptic for a user. It could probably be > > reworded, but some doc or additionnal hint around this message might help. > > > > Hmmm, not sure is WARNING is a good approach, but I don't have a better > idea at the moment. I stepped it down to NOTICE and added some more infos. Here is the output of the last patch with a 1MB work_mem: =# explain analyze select * from small join large using (id); WARNING: increasing number of batches from 1 to 2 WARNING: increasing number of batches from 2 to 4 WARNING: increasing number of batches from 4 to 8 WARNING: increasing number of batches from 8 to 16 WARNING: increasing number of batches from 16 to 32 WARNING: increasing number of batches from 32 to 64 WARNING: increasing number of batches from 64 to 128 WARNING: increasing number of batches from 128 to 256 WARNING: increasing number of batches from 256 to 512 NOTICE: Growing number of hash batch to 512 is exhausting allowed memory (2164736 > 2097152) WARNING: increasing number of batches from 512 to 1024 NOTICE: Growing number of hash batch to 1024 is exhausting allowed memory (4329472 > 2097152) WARNING: increasing number of batches from 1024 to 2048 NOTICE: Growing number of hash batch to 2048 is exhausting allowed memory (8626304 > 2097152) WARNING: increasing number of batches from 2048 to 4096 NOTICE: Growing number of hash batch to 4096 is exhausting allowed memory (17252480 > 2097152) WARNING: increasing number of batches from 4096 to 8192 NOTICE: Growing number of hash batch to 8192 is exhausting allowed memory (34504832 > 2097152) WARNING: increasing number of batches from 8192 to 16384 NOTICE: Growing number of hash batch to 16384 is exhausting allowed memory (68747392 > 2097152) WARNING: increasing number of batches from 16384 to 32768 NOTICE: Growing number of hash batch to 32768 is exhausting allowed memory (137494656 > 2097152) QUERY PLAN -------------------------------------------------------------------------- Hash Join (cost=6542057.16..7834651.23 rows=7 width=74) (actual time=558502.127..724007.708 rows=7040 loops=1) Hash Cond: (small.id = large.id) -> Seq Scan on small (cost=0.00..940094.00 rows=94000000 width=41) (actual time=0.035..3.666 rows=10000 loops=1) -> Hash (cost=6542057.07..6542057.07 rows=7 width=41) (actual time=558184.152..558184.153 rows=700000000 loops=1) Buckets: 32768 (originally 1024) Batches: 32768 (originally 1) Memory Usage: 1921kB -> Seq Scan on large (cost=0.00..6542057.07 rows=7 width=41) (actual time=0.324..193750.567 rows=700000000 loops=1) Planning Time: 1.588 ms Execution Time: 724011.074 ms (8 rows) Regards,
Attachment
pgsql-hackers by date: