Thread: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?
Hi, I found some dubious looking HTAB cleanup code for replication streams (see file:worker.c, function:stream_cleanup_files). viz. ---------- static void stream_cleanup_files(Oid subid, TransactionId xid) { char path[MAXPGPATH]; StreamXidHash *ent; /* Remove the xid entry from the stream xid hash */ ent = (StreamXidHash *) hash_search(xidhash, (void *) &xid, HASH_REMOVE, NULL); /* By this time we must have created the transaction entry */ Assert(ent != NULL); /* Delete the change file and release the stream fileset memory */ changes_filename(path, subid, xid); SharedFileSetDeleteAll(ent->stream_fileset); pfree(ent->stream_fileset); ent->stream_fileset = NULL; /* Delete the subxact file and release the memory, if it exist */ if (ent->subxact_fileset) { subxact_filename(path, subid, xid); SharedFileSetDeleteAll(ent->subxact_fileset); pfree(ent->subxact_fileset); ent->subxact_fileset = NULL; } } ---------- Notice how the code calls hash_search(... HASH_REMOVE ...), but then it deferences the same ent that was returned from that function. IIUC that is a violation of the hash_search API, whose function comment (dynahash.c) clearly says not to use the return value in such a way: ---------- * Return value is a pointer to the element found/entered/removed if any, * or NULL if no match was found. (NB: in the case of the REMOVE action, * the result is a dangling pointer that shouldn't be dereferenced!) ---------- ~~ PSA my patch to correct this by firstly doing a HASH_FIND, then only HASH_REMOVE after we've finished using the ent. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Sat, Mar 20, 2021 at 12:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA my patch to correct this by firstly doing a HASH_FIND, then only > HASH_REMOVE after we've finished using the ent. > Why can't we keep using HASH_REMOVE as it is but get the output (entry found or not) in the last parameter of hash_search API and then perform Assert based on that? See similar usage in reorderbuffer.c and rewriteheap.c. -- With Regards, Amit Kapila.
On Sun, Mar 21, 2021 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Mar 20, 2021 at 12:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA my patch to correct this by firstly doing a HASH_FIND, then only > > HASH_REMOVE after we've finished using the ent. > > > > Why can't we keep using HASH_REMOVE as it is but get the output (entry > found or not) in the last parameter of hash_search API and then > perform Assert based on that? See similar usage in reorderbuffer.c and > rewriteheap.c. > Changing the Assert doesn't do anything to fix the problem as described, i.e. dereferencing of ent after the HASH_REMOVE. The real problem isn't the Assert. It's all those other usages of ent disobeying the API rule: "(NB: in the case of the REMOVE action, the result is a dangling pointer that shouldn't be dereferenced!)" e.g. - SharedFileSetDeleteAll(ent->stream_fileset); - pfree(ent->stream_fileset); - ent->stream_fileset = NULL; - if (ent->subxact_fileset) - SharedFileSetDeleteAll(ent->subxact_fileset); - pfree(ent->subxact_fileset); - ent->subxact_fileset = NULL; ------ Kind Regards, Peter Smith Fujitsu Australia.
On Mon, Mar 22, 2021 at 10:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > The real problem isn't the Assert. It's all those other usages of ent > disobeying the API rule: "(NB: in the case of the REMOVE action, the > result is a dangling pointer that shouldn't be dereferenced!)" I suppose the HASH_REMOVE case could clobber the object with 0x7f if CLOBBER_FREED_MEMORY is defined (typically assertion builds), or alternatively return some other non-NULL but poisoned pointer, so that problems of this ilk blow up in early testing.
On Mon, Mar 22, 2021 at 9:21 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Mar 22, 2021 at 10:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > > The real problem isn't the Assert. It's all those other usages of ent > > disobeying the API rule: "(NB: in the case of the REMOVE action, the > > result is a dangling pointer that shouldn't be dereferenced!)" > > I suppose the HASH_REMOVE case could clobber the object with 0x7f if > CLOBBER_FREED_MEMORY is defined (typically assertion builds), or > alternatively return some other non-NULL but poisoned pointer, so that > problems of this ilk blow up in early testing. +1, but not sure if the poisoned ptr alternative can work because some code (e.g see RemoveTargetIfNoLongerUsed function) is asserting the return ptr actual value, not just its NULL-ness. ------ Kind Regards, Peter Smith. Fujitsu Australia.
Hi, On 2021-03-22 11:20:37 +1300, Thomas Munro wrote: > On Mon, Mar 22, 2021 at 10:50 AM Peter Smith <smithpb2250@gmail.com> wrote: > > The real problem isn't the Assert. It's all those other usages of ent > > disobeying the API rule: "(NB: in the case of the REMOVE action, the > > result is a dangling pointer that shouldn't be dereferenced!)" Right that's clearly not ok. > I suppose the HASH_REMOVE case could clobber the object with 0x7f if > CLOBBER_FREED_MEMORY is defined (typically assertion builds) Yea. Plus VALGRIND_MAKE_MEM_NOACCESS() (requiring a VALGRIND_MAKE_MEM_UNDEFINED() when reusing) or at least a VALGRIND_MAKE_MEM_UNDEFINED(). >or alternatively return some other non-NULL but poisoned pointer, so >that problems of this ilk blow up in early testing. IMO it's just a bad API to combine the different use cases into hash_search(). It's kinda defensible to have FIND/ENTER/ENTER_NULL go through the same function (although I do think it makes code harder to read), but having HASH_REMOVE is just wrong. The only reason for returning a dangling pointer is that that's the obvious way to check if something was found. If they weren't combined we could tell newer compilers that the memory shouldn't be accessed after the HASH_REMOVE anymore. And it'd remove some unnecessary branches in performance critical code... But I guess making dynahash not terrible from that POV basically means replacing all of dynahash. Having all those branches for partitioned hashes, actions are really not great. Greetings, Andres Freund
On Mon, Mar 22, 2021 at 3:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sun, Mar 21, 2021 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Mar 20, 2021 at 12:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > PSA my patch to correct this by firstly doing a HASH_FIND, then only > > > HASH_REMOVE after we've finished using the ent. > > > > > > > Why can't we keep using HASH_REMOVE as it is but get the output (entry > > found or not) in the last parameter of hash_search API and then > > perform Assert based on that? See similar usage in reorderbuffer.c and > > rewriteheap.c. > > > > Changing the Assert doesn't do anything to fix the problem as > described, i.e. dereferencing of ent after the HASH_REMOVE. > > The real problem isn't the Assert. It's all those other usages of ent > disobeying the API rule: "(NB: in the case of the REMOVE action, the > result is a dangling pointer that shouldn't be dereferenced!)" > Right, that is a problem. I see that your patch will fix it. Thanks. -- With Regards, Amit Kapila.
On Mon, Mar 22, 2021 at 7:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 22, 2021 at 3:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Sun, Mar 21, 2021 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Mar 20, 2021 at 12:54 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > PSA my patch to correct this by firstly doing a HASH_FIND, then only > > > > HASH_REMOVE after we've finished using the ent. > > > > > > > > > > Why can't we keep using HASH_REMOVE as it is but get the output (entry > > > found or not) in the last parameter of hash_search API and then > > > perform Assert based on that? See similar usage in reorderbuffer.c and > > > rewriteheap.c. > > > > > > > Changing the Assert doesn't do anything to fix the problem as > > described, i.e. dereferencing of ent after the HASH_REMOVE. > > > > The real problem isn't the Assert. It's all those other usages of ent > > disobeying the API rule: "(NB: in the case of the REMOVE action, the > > result is a dangling pointer that shouldn't be dereferenced!)" > > > > Right, that is a problem. I see that your patch will fix it. Thanks. > Pushed your patch. -- With Regards, Amit Kapila.