Re: 回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset - Mailing list pgsql-bugs

From Tom Lane
Subject Re: 回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Date
Msg-id 160523.1757190713@sss.pgh.pa.us
Whole thread Raw
In response to 回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset  ("李海洋(陌痕)" <mohen.lhy@alibaba-inc.com>)
Responses 回复:回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
List pgsql-bugs
"=?UTF-8?B?5p2O5rW35rSLKOmZjOeXlSk=?=" <mohen.lhy@alibaba-inc.com> writes:
> On 2025-09-03 23:35 Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I wonder why ExecInitSubPlan is making a separate context for this at all,
>> rather than using some surrounding short-lived context.

> This behavior was introduced by commit 133924e to fix one bug. AFAICS, the
> tempcxt is only used by hashfunc evaluation. We should reset tempcxt after per
> hashtable find if tempcxt is a separate context.

I thought it unlikely that this leak has been there unreported since
2010, so I tried the test case in older branches and soon found that
v10 and older don't exhibit the leak, or at least it's much less
virulent there.  A "git bisect" session found that the behavior
changed at

    bf6c614a2f2c58312b3be34a47e7fb7362e07bcb is the first bad commit
    commit bf6c614a2f2c58312b3be34a47e7fb7362e07bcb
    Author: Andres Freund <andres@anarazel.de>
    Date:   Thu Feb 15 21:55:31 2018 -0800

        Do execGrouping.c via expression eval machinery, take two.

Before that commit, TupleHashTableMatch reset hashtable->tempcxt
(via execTuplesMatch).  Now what it resets is a different context
hashtable->exprcontext->ecxt_per_tuple_memory, and so there's no
reset of hashtable->tempcxt anywhere in this particular loop.

So that leads me to not trust fixing this in nodeSubplan.c,
because that's just one caller that can reach TupleHashTableMatch:
assuming that there are no other similar leaks seems dangerous.

I experimented with putting MemoryContextReset(hashtable->tempcxt)
into TupleHashTableMatch, and that does stop this leak.  But even
though that's a one-line fix, I don't like that solution either,
for two reasons:

1. TupleHashTableMatch is likely to be invoked multiple times per
hashtable lookup, so that this way results in many more resets
than are really necessary.

2. It's not entirely clear that this way can't clobber data we
still need, since the hash function outputs are surely longer-lived
than individual tuple matching operations.

After contemplating things for awhile, I think that feichanghong's
idea is the right one after all: in each of the functions that switch
into hashtable->tempcxt, let's do a reset on the way out, as attached.
That's straightforward and visibly related to the required data
lifespan.

Interestingly, both in pre-v11 branches and with the one-line fix,
I notice that the test query's memory consumption bounces around a
little (10MB or so), while it seems absolutely steady with the
attached.  I interpret that to mean that we weren't resetting
the tempcxt quite often enough, so that there was some amount of
leakage in between calls of TupleHashTableMatch, even though we
got there often enough to avoid disaster in this particular test.
That's another reason to prefer this way over other solutions,
I think.

            regards, tom lane

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index b5400749353..600d1ae5a96 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -316,6 +316,7 @@ LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
     Assert(entry == NULL || entry->hash == local_hash);

     MemoryContextSwitchTo(oldContext);
+    MemoryContextReset(hashtable->tempcxt);

     return entry;
 }
@@ -338,6 +339,7 @@ TupleHashTableHash(TupleHashTable hashtable, TupleTableSlot *slot)
     hash = TupleHashTableHash_internal(hashtable->hashtab, NULL);

     MemoryContextSwitchTo(oldContext);
+    MemoryContextReset(hashtable->tempcxt);

     return hash;
 }
@@ -365,6 +367,7 @@ LookupTupleHashEntryHash(TupleHashTable hashtable, TupleTableSlot *slot,
     Assert(entry == NULL || entry->hash == hash);

     MemoryContextSwitchTo(oldContext);
+    MemoryContextReset(hashtable->tempcxt);

     return entry;
 }
@@ -398,7 +401,9 @@ FindTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
     /* Search the hash table */
     key = NULL;                    /* flag to reference inputslot */
     entry = tuplehash_lookup(hashtable->hashtab, key);
+
     MemoryContextSwitchTo(oldContext);
+    MemoryContextReset(hashtable->tempcxt);

     return entry;
 }

pgsql-bugs by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: Broken PQtrace CopyData display
Next
From: "李海洋(陌痕)"
Date:
Subject: 回复:回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset