Thread: Is element access after HASH_REMOVE ever OK?
Hi, After hearing from a couple of directions about systems spending too much time scanning the local lock hash table, I wrote the trivial patch to put them in a linked list, before learning that people have considered that before, so I should probably go and read some history on that and find out why it hasn't been done... However, I noticed in passing that RemoveLocalLock() accesses *locallock after removing it from the hash table (in assertion builds only). So one question I have is whether it's actually a programming rule that you can't do that (at most you can compare the pointer against NULL), or whether it's supposed to be safe-if-you-know-what-you're-doing, as the existing comments hints. Here also is a patch that does wipe_mem on removed elements, as threatened last time this topic came up[1], which reveals the problem. I'm also not exactly sure why it's only a WARNING if your local lock table is out of sync, but perhaps that's in the archives too. [1] https://www.postgresql.org/message-id/flat/CAHut%2BPs-pL%2B%2Bf6CJwPx2%2BvUqXuew%3DXt-9Bi-6kCyxn%2BFwi2M7w%40mail.gmail.com
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > However, I noticed in passing that RemoveLocalLock() accesses > *locallock after removing it from the hash table (in assertion builds > only). So one question I have is whether it's actually a programming > rule that you can't do that (at most you can compare the pointer > against NULL), or whether it's supposed to be > safe-if-you-know-what-you're-doing, as the existing comments hints. I'd say it's, at best, unwarranted familiarity with the dynahash implementation ... > Here also is a patch that does wipe_mem on removed elements, as > threatened last time this topic came up[1], which reveals the problem. ... one good reason being that it'll fail under this sort of entirely-reasonable debugging aid. Can we get rid of the unsafe access easily? regards, tom lane
I wrote: > ... Can we get rid of the unsafe > access easily? Oh, shoulda read your second patch first. Looking at that, I fear it might not be quite that simple, because the comment on CheckAndSetLockHeld says very clearly * It is callers responsibility that this function is called after * acquiring/releasing the relation extension/page lock. so your proposed patch violates that specification. I'm inclined to think that this API spec is very poorly thought out and should be changed --- why is it that the flags should change *after* the lock change in both directions? But we'd have to take a look at the usage of these flags to understand what's going on exactly. regards, tom lane
Hi, On 2021-05-10 20:15:41 -0400, Tom Lane wrote: > I wrote: > > ... Can we get rid of the unsafe > > access easily? > > Oh, shoulda read your second patch first. Looking at that, > I fear it might not be quite that simple, because the > comment on CheckAndSetLockHeld says very clearly > > * It is callers responsibility that this function is called after > * acquiring/releasing the relation extension/page lock. > > so your proposed patch violates that specification. It wouldn't be too hard to fix this though - we can just copy the locktag into a local variable. Or use one of the existing local copies, higher in the stack. But: > I'm inclined to think that this API spec is very poorly thought out > and should be changed --- why is it that the flags should change > *after* the lock change in both directions? But we'd have to take > a look at the usage of these flags to understand what's going on > exactly. I can't see a need to do it after the HASH_REMOVE at least - as we don't return early if that fails, there's no danger getting out of sync if we reverse the order. I think the comment could just be changed to say that the function has to be called after it is inevitable that the lock is acquired/released. Greetings, Andres Freund