Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Memory leak from ExecutorState context? |
Date | |
Msg-id | 20230508155648.lzddwywo6emote2b@liskov Whole thread Raw |
In response to | Re: Memory leak from ExecutorState context? (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>) |
Responses |
Re: Memory leak from ExecutorState context?
|
List | pgsql-hackers |
Thanks for continuing to work on this! On Thu, May 04, 2023 at 07:30:06PM +0200, Jehan-Guillaume de Rorthais wrote: > On Fri, 21 Apr 2023 16:44:48 -0400 Melanie Plageman <melanieplageman@gmail.com> wrote: ... > > I think the biggest change that is needed is to implement this memory > > context usage for parallel hash join. > > Indeed. ... > 4. accessor->read_buffer is currently allocated in accessor->context as well. > > This buffer holds tuple read from the fileset. This is still a buffer, but > not related to any file anymore... > > Because of 3 and 4, and the gross context granularity of SharedTuplestore > related-code, I'm now wondering if "fileCxt" shouldn't be renamed "bufCxt". I think bufCxt is a potentially confusing name. The context contains pointers to the batch files and saying those are related to buffers is confusing. It also sounds like it could include any kind of buffer related to the hashtable or hash join. Perhaps we could call this memory context the "spillCxt"--indicating it is for the memory required for spilling to permanent storage while executing hash joins. I discuss this more in my code review below. > From c5ed2ae2c2749af4f5058b012dc5e8a9e1529127 Mon Sep 17 00:00:00 2001 > From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> > Date: Mon, 27 Mar 2023 15:54:39 +0200 > Subject: [PATCH 2/3] Allocate hash batches related BufFile in a dedicated > context > diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c > index 5fd1c5553b..a4fbf29301 100644 > --- a/src/backend/executor/nodeHash.c > +++ b/src/backend/executor/nodeHash.c > @@ -570,15 +574,21 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations, > > if (nbatch > 1 && hashtable->parallel_state == NULL) > { > + MemoryContext oldctx; > + > /* > * allocate and initialize the file arrays in hashCxt (not needed for > * parallel case which uses shared tuplestores instead of raw files) > */ > + oldctx = MemoryContextSwitchTo(hashtable->fileCxt); > + > hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch); > hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch); > /* The files will not be opened until needed... */ > /* ... but make sure we have temp tablespaces established for them */ I haven't studied it closely, but I wonder if we shouldn't switch back into the oldctx before calling PrepareTempTablespaces(). PrepareTempTablespaces() is explicit about what memory context it is allocating in, however, I'm not sure it makes sense to call it in the fileCxt. If you have a reason, you should add a comment and do the same in ExecHashIncreaseNumBatches(). > PrepareTempTablespaces(); > + > + MemoryContextSwitchTo(oldctx); > } > @@ -934,13 +943,16 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) > hashtable, nbatch, hashtable->spaceUsed); > #endif > > - oldcxt = MemoryContextSwitchTo(hashtable->hashCxt); > - > if (hashtable->innerBatchFile == NULL) > { > + MemoryContext oldcxt = MemoryContextSwitchTo(hashtable->fileCxt); > + > /* we had no file arrays before */ > hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch); > hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch); > + As mentioned above, you should likely make ExecHashTableCreate() consistent with this. > + MemoryContextSwitchTo(oldcxt); > + > /* time to establish the temp tablespaces, too */ > PrepareTempTablespaces(); > } > @@ -951,8 +963,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) I don't see a reason to call repalloc0_array() in a different context than the initial palloc0_array(). > hashtable->outerBatchFile = repalloc0_array(hashtable->outerBatchFile, BufFile *, oldnbatch, nbatch); > } > > - MemoryContextSwitchTo(oldcxt); > - > hashtable->nbatch = nbatch; > > /* > diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c > index 920d1831c2..ac72fbfbb6 100644 > --- a/src/backend/executor/nodeHashjoin.c > +++ b/src/backend/executor/nodeHashjoin.c > @@ -485,8 +485,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) > */ > Assert(parallel_state == NULL); > Assert(batchno > hashtable->curbatch); > + > ExecHashJoinSaveTuple(mintuple, hashvalue, > - &hashtable->outerBatchFile[batchno]); > + &hashtable->outerBatchFile[batchno], > + hashtable->fileCxt); > > if (shouldFree) > heap_free_minimal_tuple(mintuple); > @@ -1308,21 +1310,27 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate) > * The data recorded in the file for each tuple is its hash value, It doesn't sound accurate to me to say that it should be called *in* the HashBatchFiles context. We now switch into that context, so you can probably remove this comment and add a comment above the switch into the filecxt which explains that the temp file buffers should be allocated in the filecxt (both because they need to be allocated in a sufficiently long-lived context and to increase visibility of their memory overhead). > * then the tuple in MinimalTuple format. > * > - * Note: it is important always to call this in the regular executor > - * context, not in a shorter-lived context; else the temp file buffers > - * will get messed up. > + * Note: it is important always to call this in the HashBatchFiles context, > + * not in a shorter-lived context; else the temp file buffers will get messed > + * up. > */ > void > ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue, > - BufFile **fileptr) > + BufFile **fileptr, MemoryContext filecxt) > { > BufFile *file = *fileptr; > > if (file == NULL) > { > + MemoryContext oldctx; > + > + oldctx = MemoryContextSwitchTo(filecxt); > + > /* First write to this batch file, so open it. */ > file = BufFileCreateTemp(false); > *fileptr = file; > + > + MemoryContextSwitchTo(oldctx); > } > diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h > index 8ee59d2c71..74867c3e40 100644 > --- a/src/include/executor/hashjoin.h > +++ b/src/include/executor/hashjoin.h > @@ -25,10 +25,14 @@ > * > * Each active hashjoin has a HashJoinTable control block, which is > * palloc'd in the executor's per-query context. All other storage needed > - * for the hashjoin is kept in private memory contexts, two for each hashjoin. > + * for the hashjoin is kept in private memory contexts, three for each > + * hashjoin: Maybe "hash table control block". I know the phrase "control block" is used elsewhere in the comments, but it is a bit confusing. Also, I wish there was a way to make it clear this is for the hashtable but relevant to all batches. > + * - HashTableContext (hashCxt): the control block associated to the hash table I might say "per-batch storage for serial hash join". > + * - HashBatchContext (batchCxt): storages for batches So, if we are going to allocate the array of pointers to the spill files in the fileCxt, we should revise this comment. As I mentioned above, I agree with you that if the SharedTupleStore-related buffers are also allocated in this context, perhaps it shouldn't be called the fileCxt. One idea I had is calling it the spillCxt. Almost all memory allocated in this context is related to needing to spill to permanent storage during execution. The one potential confusing part of this is batch 0 for parallel hash join. I would have to dig back into it again, but from a cursory look at ExecParallelHashJoinSetUpBatches() it seems like batch 0 also gets a shared tuplestore with associated accessors and files even if it is a single batch parallel hashjoin. Are the parallel hash join read_buffer and write_chunk also used for a single batch parallel hash join? Though, perhaps spillCxt still makes sense? It doesn't specify multi-batch... > --- a/src/include/executor/nodeHashjoin.h > +++ b/src/include/executor/nodeHashjoin.h > @@ -29,6 +29,6 @@ extern void ExecHashJoinInitializeWorker(HashJoinState *state, > ParallelWorkerContext *pwcxt); > I would add a comment explaining why ExecHashJoinSaveTuple() is passed with the fileCxt as a parameter. > extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue, > - BufFile **fileptr); > + BufFile **fileptr, MemoryContext filecxt); - Melanie
pgsql-hackers by date: