Re: Use BumpContext contexts for TupleHashTables' tablecxt - Mailing list pgsql-hackers
| From | Tom Lane | 
|---|---|
| Subject | Re: Use BumpContext contexts for TupleHashTables' tablecxt | 
| Date | |
| Msg-id | 2927093.1761783733@sss.pgh.pa.us Whole thread Raw | 
| In response to | Re: Use BumpContext contexts for TupleHashTables' tablecxt (David Rowley <dgrowleyml@gmail.com>) | 
| Responses | Re: Use BumpContext contexts for TupleHashTables' tablecxt | 
| List | pgsql-hackers | 
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 30 Oct 2025 at 08:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a v2 that tries to clean up some of the mess.
> Thanks for doing that cleanup. A good improvement.
Thanks for reviewing!
> Just a couple of
> notes from the review for your consideration:
> 1) The comment in ExecEndSetOp() says: "/* free subsidiary stuff
> including hashtable */". How about adjusting that to "hashtable data"?
Agreed, also in ExecEndRecursiveUnion.
> Is it worth mentioning there that we leave it up to the destruction of
> the hash table itself to when the ExecutorState context is reset?
> (Apparently execGrouping.c does not expose any way to do
> tuplehash_destroy() anyway)
Yeah.  I think this is better documented in execGrouping.c.  I propose
adding this:
@@ -169,6 +169,13 @@ execTuplesHashPrepare(int numCols,
  * reasonably often, typically once per tuple.  (We do it that way, rather
  * than managing an extra context within the hashtable, because in many cases
  * the caller can specify a tempcxt that it needs to reset per-tuple anyway.)
+ *
+ * We don't currently provide DestroyTupleHashTable functionality; the hash
+ * table will be cleaned up at destruction of the metacxt.  (Some callers
+ * bother to delete the tuplescxt explicitly, though it'd be sufficient to
+ * ensure it's a child of the metacxt.)  There's not much point in working
+ * harder than this so long as the expression-evaluation infrastructure
+ * behaves similarly.
  */
 TupleHashTable
 BuildTupleHashTable(PlanState *parent,
> 2) I'm not sure if it makes it neater or not, but did you consider
> moving the creation of the setopstate->tuplesContext context into
> build_hash_table()? The motivation there is that it keeps the
> hash-related initialisations together.
Hmm, could do that, but all of these callers currently follow the
pattern of creating these contexts in the plan nodes' Init functions
and destroying them in the End functions.  I don't see a big advantage
in changing that.
> 3) Not for this patch, but wanted to note the observation: I see we
> always create the hash table during plan startup in nodeSetOp.c. I
> suspect this could cause the same issue for Setops as I fixed in
> Memoize in 57f59396b, which fixed a slow EXPLAIN in [1].
Fair point.  I'd be inclined to just skip the hashtable creation
if (eflags & EXEC_FLAG_EXPLAIN_ONLY), like nodeAgg.c did.
I wonder though if skipping the initialization of the hashtable
execution expressions has any visible impact on what EXPLAIN can
print.  I have a vague recollection that there were some places
where we backed off similar optimizations because of concerns
like that.
>> I now think that if we care, the right thing is to make the remaining
>> callers construct extra memory contexts for their hash tables.  That
>> could be left for another day.
> Reminds me of 590b045c3, but I think maybe not as bad as the bucket
> array for the hash table is more likely to be a large allocation,
> which aset.c will put on a dedicated AllocBlock rather than sharing an
> AllocBlock with other chunks. ResetTupleHashTable() also just results
> in a SH_RESET(), which doesn't do any pfreeing work (just memset()).
> i.e. shouldn't cause fragmentation due to many rescans.
My concern here was just whether you could tell by MemoryContextStats
inspection how much to blame on the hash table.
            regards, tom lane
		
	pgsql-hackers by date: