Thread: Broken resetting of subplan hash tables
Hi, I looked again at one of the potential issues Ranier Vilela's static analysis found and after looking more at it I still think this one is a real bug. But my original patch was incorrect and introduced a use after free bug. The code for resetting the hash tables of the SubPlanState node in buildSubPlanHash() looks like it can never run, and additionally it would be broken if it would ever run. This issue was introduced in commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd. As far as I gather the following is true: 1. It sets node->hashtable and node->hashnulls to NULL a few lines before checking if they are not NULL which means the code for resetting them cannot ever be reached. 2. But if we changed to code so that the ResetTupleHashTable() calls are reachable we would hit a typo. It resets node->hashtable twice and never resets node->hashnulls which would cause non-obvious bugs. 3. Additionally since the memory context used by the hash tables is reset in buildSubPlanHash() if we start resetting hash tables we will get a use after free bug. I have attached a patch which makes sure the code for resetting the hash tables is actually run while also fixing the code for resetting them. Since the current behavior of the code in HEAD is not actually broken, it is just an optimization which is not used, this fix does not have to be backpatched. Andreas
Attachment
Andreas Karlsson <andreas@proxel.se> writes: > The code for resetting the hash tables of the SubPlanState node in > buildSubPlanHash() looks like it can never run, and additionally it > would be broken if it would ever run. This issue was introduced in > commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd. Right. Justin Pryzby also noted this a couple weeks back, but we didn't deal with it at that point because we were up against a release deadline. > As far as I gather the following is true: > 1. It sets node->hashtable and node->hashnulls to NULL a few lines > before checking if they are not NULL which means the code for resetting > them cannot ever be reached. Yeah. Those lines should have been removed when the ResetTupleHashTable logic was added. > 2. But if we changed to code so that the ResetTupleHashTable() calls are > reachable we would hit a typo. It resets node->hashtable twice and never > resets node->hashnulls which would cause non-obvious bugs. Right. > 3. Additionally since the memory context used by the hash tables is > reset in buildSubPlanHash() if we start resetting hash tables we will > get a use after free bug. Nope, not right. The hash table metadata is now allocated in the es_query_cxt; what is in the hashtablecxt is just tuples, and that does need to be cleared, per the comment for ResetTupleHashTable. Your patch as given results in a nasty memory leak, which is easily demonstrated with a small mod to the regression test case I added: select sum(ss.tst::int) from generate_series(1,10000000) o cross join lateral ( select i.ten in (select f1 from int4_tbl where f1 <= o) as tst, random() as r from onek i where i.unique1 = o%1000 ) ss; > Since the current behavior of the code in HEAD is not actually broken, > it is just an optimization which is not used, this fix does not have to > be backpatched. Unfortunately ... this test case also leaks memory like mad in HEAD, v12, and v11, because all of them are rebuilding the hash table from scratch without clearing the old one. So this is indeed broken and a back-patch is necessary. I noted while looking at this that most of the calls of ResetTupleHashTable are actually never reached in the regression tests (per code coverage report) so I made up some test cases that do reach 'em, and included those in the commit. TBH, I think that this tuple table API is seriously misdesigned; it is confusing and very error-prone to have the callers need to reset the tuple context separately from calling ResetTupleHashTable. Also, the callers all look like their resets are intended to destroy the whole hashtable not just its contents (as indeed they were doing, before the faulty commit). But I didn't attempt to fix that today. regards, tom lane
Hi, On 2020-02-29 14:02:59 -0500, Tom Lane wrote: > > 3. Additionally since the memory context used by the hash tables is > > reset in buildSubPlanHash() if we start resetting hash tables we will > > get a use after free bug. > > Nope, not right. The hash table metadata is now allocated in the > es_query_cxt; what is in the hashtablecxt is just tuples, and that > does need to be cleared, per the comment for ResetTupleHashTable. > Your patch as given results in a nasty memory leak, which is easily > demonstrated with a small mod to the regression test case I added: > select sum(ss.tst::int) from > generate_series(1,10000000) o cross join lateral ( > select i.ten in (select f1 from int4_tbl where f1 <= o) as tst, > random() as r > from onek i where i.unique1 = o%1000 ) ss; > > > Since the current behavior of the code in HEAD is not actually broken, > > it is just an optimization which is not used, this fix does not have to > > be backpatched. > > Unfortunately ... this test case also leaks memory like mad in > HEAD, v12, and v11, because all of them are rebuilding the hash > table from scratch without clearing the old one. So this is > indeed broken and a back-patch is necessary. Yea :(. Thanks for doing that. > I noted while looking at this that most of the calls of > ResetTupleHashTable are actually never reached in the regression > tests (per code coverage report) so I made up some test cases > that do reach 'em, and included those in the commit. Cool. > TBH, I think that this tuple table API is seriously misdesigned; > it is confusing and very error-prone to have the callers need to > reset the tuple context separately from calling ResetTupleHashTable. Do you have an alternative proposal? Before committing the patch adding it that way, I'd waited for quite a while asking for input... In several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside execGrouping.c that's also allocated in the same context as the table contents (via TupleHashTable->additional) - just resetting the context passed to BuildTupleHashTableExt() as part of ResetTupleHashTable() seems problematic too. We could change it so more of the metadata for execGrouping.c is computed outside of BuildTupleHashTableExt(), and continue to destroy the entire hashtable. But we'd still have to reallocate the hashtable, the slot, etc. So having a reset interface seems like the right thing. I guess we could set it up so that BuildTupleHashTableExt() registers a memory context reset callback on tablecxt, which'd reinitialize the hashtable. But that seems like it'd be at least as confusing? > Also, the callers all look like their resets are intended to destroy > the whole hashtable not just its contents (as indeed they were doing, > before the faulty commit). But I didn't attempt to fix that today. Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like that to me? Why would they want to drop the hashtable metadata when resetting? What am I missing? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-02-29 14:02:59 -0500, Tom Lane wrote: >> TBH, I think that this tuple table API is seriously misdesigned; >> it is confusing and very error-prone to have the callers need to >> reset the tuple context separately from calling ResetTupleHashTable. > Do you have an alternative proposal? I'd be inclined to let the tuple hashtable make its own tuple-storage context and reset that for itself. Is it really worth the complexity and bug hazards to share such a context with other uses? > We could change it so more of the metadata for execGrouping.c is > computed outside of BuildTupleHashTableExt(), and continue to destroy > the entire hashtable. But we'd still have to reallocate the hashtable, > the slot, etc. So having a reset interface seems like the right thing. Agreed, the reset interface is a good idea. I'm just not happy that in addition to resetting, you have to remember to reset some vaguely-related context (and heaven help you if you reset that context but not the hashtable). >> Also, the callers all look like their resets are intended to destroy >> the whole hashtable not just its contents (as indeed they were doing, >> before the faulty commit). But I didn't attempt to fix that today. > Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like > that to me? Why would they want to drop the hashtable metadata when > resetting? What am I missing? They may not look like it to you; but Andreas misread that, and so did I at first --- not least because that *is* how it used to work, and there are still comments suggesting that that's how it works, eg this in ExecInitRecursiveUnion: * If hashing, we need a per-tuple memory context for comparisons, and a * longer-lived context to store the hash table. The table can't just be * kept in the per-query context because we want to be able to throw it * away when rescanning. "throw it away" sure looks like it means the entire hashtable, not just its tuple contents. regards, tom lane
Em sáb., 29 de fev. de 2020 às 18:44, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
"throw it away" sure looks like it means the entire hashtable, not just
its tuple contents.
I don't know if I can comment clearly to help, but from my experience, destroying and rebuilding the hashtable is a waste if possible, resetting it.
By analogy, I have code with arrays where, I reuse them, with only one line, instead of reconstructing them.
a->nelts = 0; / * reset array * /
If possible, doing the same for hashtables would be great.
By analogy, I have code with arrays where, I reuse them, with only one line, instead of reconstructing them.
a->nelts = 0; / * reset array * /
If possible, doing the same for hashtables would be great.
regards,
Ranier Vilela