Thread: Is element access after HASH_REMOVE ever OK?

Is element access after HASH_REMOVE ever OK?

From

Thomas Munro

Date:

10 May 2021, 23:37:04

Hi,

After hearing from a couple of directions about systems spending too
much time scanning the local lock hash table, I wrote the trivial
patch to put them in a linked list, before learning that people have
considered that before, so I should probably go and read some history
on that and find out why it hasn't been done...

However, I noticed in passing that RemoveLocalLock() accesses
*locallock after removing it from the hash table (in assertion builds
only).  So one question I have is whether it's actually a programming
rule that you can't do that (at most you can compare the pointer
against NULL), or whether it's supposed to be
safe-if-you-know-what-you're-doing, as the existing comments hints.
Here also is a patch that does wipe_mem on removed elements, as
threatened last time this topic came up[1], which reveals the problem.
I'm also not exactly sure why it's only a WARNING if your local lock
table is out of sync, but perhaps that's in the archives too.

[1]
https://www.postgresql.org/message-id/flat/CAHut%2BPs-pL%2B%2Bf6CJwPx2%2BvUqXuew%3DXt-9Bi-6kCyxn%2BFwi2M7w%40mail.gmail.com

Attachment

Re: Is element access after HASH_REMOVE ever OK?

From

Tom Lane

Date:

11 May 2021, 00:06:10

Thomas Munro <thomas.munro@gmail.com> writes:
> However, I noticed in passing that RemoveLocalLock() accesses
> *locallock after removing it from the hash table (in assertion builds
> only).  So one question I have is whether it's actually a programming
> rule that you can't do that (at most you can compare the pointer
> against NULL), or whether it's supposed to be
> safe-if-you-know-what-you're-doing, as the existing comments hints.

I'd say it's, at best, unwarranted familiarity with the dynahash
implementation ...

> Here also is a patch that does wipe_mem on removed elements, as
> threatened last time this topic came up[1], which reveals the problem.

... one good reason being that it'll fail under this sort of
entirely-reasonable debugging aid.  Can we get rid of the unsafe
access easily?

            regards, tom lane

Re: Is element access after HASH_REMOVE ever OK?

From

Tom Lane

Date:

11 May 2021, 00:15:41

I wrote:
> ...  Can we get rid of the unsafe
> access easily?

Oh, shoulda read your second patch first.  Looking at that,
I fear it might not be quite that simple, because the
comment on CheckAndSetLockHeld says very clearly

 * It is callers responsibility that this function is called after
 * acquiring/releasing the relation extension/page lock.

so your proposed patch violates that specification.

I'm inclined to think that this API spec is very poorly thought out
and should be changed --- why is it that the flags should change
*after* the lock change in both directions?  But we'd have to take
a look at the usage of these flags to understand what's going on
exactly.

            regards, tom lane

Re: Is element access after HASH_REMOVE ever OK?

From

Andres Freund

Date:

11 May 2021, 01:27:55

Hi,

On 2021-05-10 20:15:41 -0400, Tom Lane wrote:
> I wrote:
> > ...  Can we get rid of the unsafe
> > access easily?
>
> Oh, shoulda read your second patch first.  Looking at that,
> I fear it might not be quite that simple, because the
> comment on CheckAndSetLockHeld says very clearly
>
>  * It is callers responsibility that this function is called after
>  * acquiring/releasing the relation extension/page lock.
>
> so your proposed patch violates that specification.

It wouldn't be too hard to fix this though - we can just copy the
locktag into a local variable. Or use one of the existing local copies,
higher in the stack.

But:

> I'm inclined to think that this API spec is very poorly thought out
> and should be changed --- why is it that the flags should change
> *after* the lock change in both directions?  But we'd have to take
> a look at the usage of these flags to understand what's going on
> exactly.

I can't see a need to do it after the HASH_REMOVE at least - as we don't
return early if that fails, there's no danger getting out of sync if we
reverse the order.  I think the comment could just be changed to say
that the function has to be called after it is inevitable that the lock
is acquired/released.

Greetings,

Andres Freund