Thread: Broken resetting of subplan hash tables

Broken resetting of subplan hash tables

From
Andreas Karlsson
Date:
Hi,

I looked again at one of the potential issues Ranier Vilela's static 
analysis found and after looking more at it I still think this one is a 
real bug. But my original patch was incorrect and introduced a use after 
free bug.

The code for resetting the hash tables of the SubPlanState node in 
buildSubPlanHash() looks like it can never run, and additionally it 
would be broken if it would ever run. This issue was introduced in 
commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.

As far as I gather the following is true:

1. It sets node->hashtable and node->hashnulls to NULL a few lines 
before checking if they are not NULL which means the code for resetting 
them cannot ever be reached.

2. But if we changed to code so that the ResetTupleHashTable() calls are 
reachable we would hit a typo. It resets node->hashtable twice and never 
resets node->hashnulls which would cause non-obvious bugs.

3. Additionally since the memory context used by the hash tables is 
reset in buildSubPlanHash() if we start resetting hash tables we will 
get a use after free bug.

I have attached a patch which makes sure the code for resetting the hash 
tables is actually run while also fixing the code for resetting them.

Since the current behavior of the code in HEAD is not actually broken, 
it is just an optimization which is not used, this fix does not have to 
be backpatched.

Andreas

Attachment

Re: Broken resetting of subplan hash tables

From
Tom Lane
Date:
Andreas Karlsson <andreas@proxel.se> writes:
> The code for resetting the hash tables of the SubPlanState node in 
> buildSubPlanHash() looks like it can never run, and additionally it 
> would be broken if it would ever run. This issue was introduced in 
> commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd.

Right.  Justin Pryzby also noted this a couple weeks back, but we
didn't deal with it at that point because we were up against a
release deadline.

> As far as I gather the following is true:

> 1. It sets node->hashtable and node->hashnulls to NULL a few lines 
> before checking if they are not NULL which means the code for resetting 
> them cannot ever be reached.

Yeah.  Those lines should have been removed when the ResetTupleHashTable
logic was added.

> 2. But if we changed to code so that the ResetTupleHashTable() calls are 
> reachable we would hit a typo. It resets node->hashtable twice and never 
> resets node->hashnulls which would cause non-obvious bugs.

Right.

> 3. Additionally since the memory context used by the hash tables is 
> reset in buildSubPlanHash() if we start resetting hash tables we will 
> get a use after free bug.

Nope, not right.  The hash table metadata is now allocated in the
es_query_cxt; what is in the hashtablecxt is just tuples, and that
does need to be cleared, per the comment for ResetTupleHashTable.
Your patch as given results in a nasty memory leak, which is easily
demonstrated with a small mod to the regression test case I added:

select sum(ss.tst::int) from
  generate_series(1,10000000) o cross join lateral (
    select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
                     random() as r
    from onek i where i.unique1 = o%1000 ) ss;

> Since the current behavior of the code in HEAD is not actually broken, 
> it is just an optimization which is not used, this fix does not have to 
> be backpatched.

Unfortunately ... this test case also leaks memory like mad in
HEAD, v12, and v11, because all of them are rebuilding the hash
table from scratch without clearing the old one.  So this is
indeed broken and a back-patch is necessary.

I noted while looking at this that most of the calls of
ResetTupleHashTable are actually never reached in the regression
tests (per code coverage report) so I made up some test cases
that do reach 'em, and included those in the commit.

TBH, I think that this tuple table API is seriously misdesigned;
it is confusing and very error-prone to have the callers need to
reset the tuple context separately from calling ResetTupleHashTable.
Also, the callers all look like their resets are intended to destroy
the whole hashtable not just its contents (as indeed they were doing,
before the faulty commit).  But I didn't attempt to fix that today.

            regards, tom lane



Re: Broken resetting of subplan hash tables

From
Andres Freund
Date:
Hi,

On 2020-02-29 14:02:59 -0500, Tom Lane wrote:
> > 3. Additionally since the memory context used by the hash tables is 
> > reset in buildSubPlanHash() if we start resetting hash tables we will 
> > get a use after free bug.
 > 
> Nope, not right.  The hash table metadata is now allocated in the
> es_query_cxt; what is in the hashtablecxt is just tuples, and that
> does need to be cleared, per the comment for ResetTupleHashTable.
> Your patch as given results in a nasty memory leak, which is easily
> demonstrated with a small mod to the regression test case I added:

> select sum(ss.tst::int) from
>   generate_series(1,10000000) o cross join lateral (
>     select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
>                      random() as r
>     from onek i where i.unique1 = o%1000 ) ss;
> 
> > Since the current behavior of the code in HEAD is not actually broken, 
> > it is just an optimization which is not used, this fix does not have to 
> > be backpatched.
> 
> Unfortunately ... this test case also leaks memory like mad in
> HEAD, v12, and v11, because all of them are rebuilding the hash
> table from scratch without clearing the old one.  So this is
> indeed broken and a back-patch is necessary.

Yea :(. Thanks for doing that.


> I noted while looking at this that most of the calls of
> ResetTupleHashTable are actually never reached in the regression
> tests (per code coverage report) so I made up some test cases
> that do reach 'em, and included those in the commit.

Cool.


> TBH, I think that this tuple table API is seriously misdesigned;
> it is confusing and very error-prone to have the callers need to
> reset the tuple context separately from calling ResetTupleHashTable.

Do you have an alternative proposal? Before committing the patch adding
it that way, I'd waited for quite a while asking for input...  In
several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside
execGrouping.c that's also allocated in the same context as the table
contents (via TupleHashTable->additional) - just resetting the context
passed to BuildTupleHashTableExt() as part of ResetTupleHashTable()
seems problematic too.

We could change it so more of the metadata for execGrouping.c is
computed outside of BuildTupleHashTableExt(), and continue to destroy
the entire hashtable. But we'd still have to reallocate the hashtable,
the slot, etc.  So having a reset interface seems like the right thing.

I guess we could set it up so that BuildTupleHashTableExt() registers a
memory context reset callback on tablecxt, which'd reinitialize the
hashtable. But that seems like it'd be at least as confusing?


> Also, the callers all look like their resets are intended to destroy
> the whole hashtable not just its contents (as indeed they were doing,
> before the faulty commit).  But I didn't attempt to fix that today.

Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
that to me? Why would they want to drop the hashtable metadata when
resetting? What am I missing?

Greetings,

Andres Freund



Re: Broken resetting of subplan hash tables

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2020-02-29 14:02:59 -0500, Tom Lane wrote:
>> TBH, I think that this tuple table API is seriously misdesigned;
>> it is confusing and very error-prone to have the callers need to
>> reset the tuple context separately from calling ResetTupleHashTable.

> Do you have an alternative proposal?

I'd be inclined to let the tuple hashtable make its own tuple-storage
context and reset that for itself.  Is it really worth the complexity
and bug hazards to share such a context with other uses?

> We could change it so more of the metadata for execGrouping.c is
> computed outside of BuildTupleHashTableExt(), and continue to destroy
> the entire hashtable. But we'd still have to reallocate the hashtable,
> the slot, etc.  So having a reset interface seems like the right thing.

Agreed, the reset interface is a good idea.  I'm just not happy that
in addition to resetting, you have to remember to reset some
vaguely-related context (and heaven help you if you reset that context
but not the hashtable).

>> Also, the callers all look like their resets are intended to destroy
>> the whole hashtable not just its contents (as indeed they were doing,
>> before the faulty commit).  But I didn't attempt to fix that today.

> Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
> that to me? Why would they want to drop the hashtable metadata when
> resetting? What am I missing?

They may not look like it to you; but Andreas misread that, and so did
I at first --- not least because that *is* how it used to work, and
there are still comments suggesting that that's how it works, eg this
in ExecInitRecursiveUnion:

     * If hashing, we need a per-tuple memory context for comparisons, and a
     * longer-lived context to store the hash table.  The table can't just be
     * kept in the per-query context because we want to be able to throw it
     * away when rescanning.

"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.

            regards, tom lane



Re: Broken resetting of subplan hash tables

From
Ranier Vilela
Date:
Em sáb., 29 de fev. de 2020 às 18:44, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.
I don't know if I can comment clearly to help, but from my experience, destroying and rebuilding the hashtable is a waste if possible, resetting it.
By analogy, I have code with arrays where, I reuse them, with only one line, instead of reconstructing them.
a->nelts = 0; / * reset array * /
If possible, doing the same for hashtables would be great.
 
regards,
Ranier Vilela