Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers
| From | Andres Freund |
|---|---|
| Subject | Re: Buffer locking is special (hints, checksums, AIO writes) |
| Date | |
| Msg-id | uuukiok2b5gbbae6blokwim2ohxxvp5235epd5zrbc5d3fojb6@vsqee77athye Whole thread Raw |
| In response to | Re: Buffer locking is special (hints, checksums, AIO writes) (Melanie Plageman <melanieplageman@gmail.com>) |
| Responses |
Re: Buffer locking is special (hints, checksums, AIO writes)
|
| List | pgsql-hackers |
Hi,
On 2025-11-25 15:02:02 -0500, Melanie Plageman wrote:
> On Tue, Nov 25, 2025 at 11:54 AM Andres Freund <andres@anarazel.de> wrote:
> > > > As of this commit nothing uses the new share-exclusive lock mode. It will be
> > > > documented and used in a future commit. It seemed too complicated to introduce
> > > > the lock-level in a separate commit.
> > >
> > > I would have liked this mentioned earlier in the commit message.
> >
> > Hm, ok. I could split share-exclusive out again, but it was somewhat painful,
> > because it doesn't just lead to adding code, but to changing code.
>
> I don't think you need to introduce share-exclusive in a separate
> commit. I just mean it would be good to mention in the commit message
> before you get into so many other details that this commit doesn't use
> the share-exclusive level yet.
>
> > > Also, I don't know how I feel about it being "documented" in a future
> > > commit...perhaps just don't say that.
> >
> > Hm, why?
>
> I don't really see you documenting share-exclusive lock semantics in a
> later commit. Documentation for what the lock level is should be in
> the same commit that introduces it -- which I think you mostly have
> done. I feel like it is weird to say you will document that lock level
> later when 1) you don't really do that and 2) this commit mostly
> (besides some requests I had for elaboration) already does that.
The problem I faced was that the explanation for the lock level depends on
later changes - how do you document why share-exclusive is useful without
explaining the hint bit thing, which isn't yet implemented as of that commit.
> > > diff --git a/src/include/storage/buf_internals.h
> > > b/src/include/storage/buf_internals.h
> > > index 28519ad2813..0a145d95024 100644
> > > --- a/src/include/storage/buf_internals.h
> > > +++ b/src/include/storage/buf_internals.h
> > > @@ -32,22 +33,29 @@
> > > /*
> > > * Buffer state is a single 64-bit variable where following data is combined.
> > > *
> > > + * State of the buffer itself:
> > > * - 18 bits refcount
> > > * - 4 bits usage count
> > > * - 10 bits of flags
> > > *
> > > + * State of the content lock:
> > > + * - 1 bit has_waiter
> > > + * - 1 bit release_ok
> > > + * - 1 bit lock state locked
> > >
> > > Somewhere you should clearly explain the scenarios in which you still
> > > need to take the buffer header lock with LockBufHdr() vs when you can
> > > just use atomic operations/CAS loops.
> >
> > There is an explanation of that, or at least my attempt at it ;). See the
> > dcoumentation for BufferDesc.
>
> Hmm. I suppose. I was imagining something above the state member since
> you have added to it, but okay.
>
> Separately, I'll admit I don't quite understand when I have to use
> LockBufHdr() and when I should use a CAS loop to update the
> bufferdesc->state.
It's definitely subtle :(. Not sure what to do about that, other than to work
on eventually just making everything doable with just a CAS. But that's a fair
bit of future work away.
> > > @@ -6625,7 +7295,25 @@ ResOwnerReleaseBufferPin(Datum res)
> > > if (BufferIsLocal(buffer))
> > > UnpinLocalBufferNoOwner(buffer);
> > > else
> > > + {
> > > + PrivateRefCountEntry *ref;
> > > +
> > > + ref = GetPrivateRefCountEntry(buffer, false);
> > > +
> > > + /*
> > > + * If the buffer was locked at the time of the resowner release,
> > > + * release the lock now. This should only happen after errors.
> > > + */
> > > + if (ref->data.lockmode != BUFFER_LOCK_UNLOCK)
> > > + {
> > > + BufferDesc *buf = GetBufferDescriptor(buffer - 1);
> > > +
> > > + HOLD_INTERRUPTS(); /* match the upcoming RESUME_INTERRUPTS */
> > > + BufferLockUnlock(buffer, buf);
> > > + }
> > > +
> > > UnpinBufferNoOwner(GetBufferDescriptor(buffer - 1));
> > > + }
> > > }
> > >
> > > Bit confusing that ResOwnerReleaseBufferBin() now releases locks as well.
> >
> > Do you have a better suggestion? I'll add a comment that makes that explicit,
> > but other than that I don't have a great idea. Renaming the whole buffer pin
> > mechanism seems pretty noisy.
>
> ResOwnerReleaseBuffer()?
>
> What do you mean renaming the whole buffer pin mechanism?
All the *PrivateRefCount* stuff arguably aught to be renamed...
> > > I presume you didn't do this because HeapTupleSatisifiesMVCC() for
> > > SNAPSHOT_MVCC calls the non-batch version (and, of course,
> > > HeapTupleSatisifiesVisibility() is the much more common case).
> > >
> > > if (TransactionIdIsValid(xid))
> > > {
> > > if (BufferIsPermanent(buffer))
> > > {
> > > /* NB: xid must be known committed here! */
> > > XLogRecPtr commitLSN = TransactionIdGetCommitLSN(xid);
> > >
> > > if (XLogNeedsFlush(commitLSN) &&
> > > BufferGetLSNAtomic(buffer) < commitLSN)
> > > {
> > > /* not flushed and no LSN interlock, so don't
> > > set hint */
> > > return; false;
> > > }
> > > }
> > > }
> > >
> > > Separately, I was thinking, should we assert here about having the
> > > right lock type?
> >
> > Not sure I get what assert of what locktype where?
>
> In SetHintBitsExt() that we have share-exclusive or above.
We *don't* necessarily hold that though, it'll just be acquired by
BufferBeginSetHintBits(). Or do you mean in the SHB_ENABLED case?
> Or are there still callers with only a share lock?
Almost all of them, I think? We can't just call this with an unconditional
share-exclusive lock, since that'd destroy concurrency. Instead we just try to
upgrade to share-exclusive to set the hint bit. If we can't get
share-exclusive, we don't need to block, we just haven't set the hint bit.
> > > @@ -1628,6 +1701,9 @@ HeapTupleSatisfiesMVCCBatch(Snapshot snapshot,
> > > Buffer buffer,
> > > + if (state == SHB_ENABLED)
> > > + BufferFinishSetHintBits(buffer, true, true);
> > > +
> > > return nvis;
> > > }
> > >
> > > I wondered if it would be more natural for BufferBeginSetHintBits()
> > > and BufferFinishSetHintBits() to set SHB_INITIAL and SHB_DISABLED
> > > instead of having callers do it. But, I guess you don't do this
> > > because of gist and hash indexes using this for doing their own
> > > modifications.
> >
> > I thought about it. But yea, the different callers seemed to make that not
> > really useful. It'd also mean that we'd do an external function call for every
> > tuple, which I think would be prohibitively expensive.
>
> I'm confused why this would mean more external function calls.
The visibility logic is in heapam_visibility.c, the locking for the buffer in
bufmgr.c? If the knowledge about SetHintBitsState lives in
BufferBeginSetHintBits(), we need to call it to do that check.
> > > --- a/src/backend/storage/freespace/freespace.c
> > > +++ b/src/backend/storage/freespace/freespace.c
> > > @@ -904,14 +904,22 @@ fsm_vacuum_page(Relation rel, FSMAddress addr,
> > > max_avail = fsm_get_max_avail(page);
> > > /*
> > > - * Reset the next slot pointer. This encourages the use of low-numbered
> > > - * pages, increasing the chances that a later vacuum can truncate the
> > > - * relation. We don't bother with a lock here, nor with marking the page
> > > - * dirty if it wasn't already, since this is just a hint.
> > > + * Try to reset the next slot pointer. This encourages the use of
> > > + * low-numbered pages, increasing the chances that a later vacuum can
> > > + * truncate the relation. We don't bother with a lock here, nor with
> > > + * marking the page dirty if it wasn't already, since this is just a hint.
> > > + *
> > > + * To be allowed to update the page without an exclusive lock, we have to
> > > + * use the hint bit infrastructure.
> > > */
> > >
> > > What the heck? This didn't even take a share lock before...
> >
> > It is insane. I do not understand how anybody thought this was ok. I think I
> > probably should split this out into a separate commit, since it's so insane.
>
> Aren't you going to backport having it take a lock? Then it can be
> separate (e.g. have it take a share lock in the "fix" commit and then
> this commit bumps it to share-exclusive).
I wasn't thinking we should backport that. It's been this way for ages,
without having caused known issues....
> > > diff --git a/src/backend/storage/freespace/fsmpage.c
> > > b/src/backend/storage/freesp>
> > > /*
> > > * Update the next-target pointer. Note that we do this even if we're only
> > > * holding a shared lock, on the grounds that it's better to use a shared
> > > * lock and get a garbled next pointer every now and then, than take the
> > > * concurrency hit of an exclusive lock.
> > >
> > > We appear to avoid the garbling now?
> >
> > I don't think so. Two backends concurrently can do fsm_search_avail() and
> > one backend might set a hint to a page that is already used up by the other
> > one. At least I think so?
>
> Maybe I don't know what it meant by garbled, but I thought it was
> talking about two backends each trying to set fp_next_slot. If they
> now have to have a share-exclusive lock and they can't both have a
> share-exclusive lock at the same time, then it seems like that
> wouldn't be a problem. It sounds like you may be talking about a
> backend taking up the freespace of a page that is referred to by the
> fp_next_slot?
Yes, a version of the latter. The value that fp_next_slot will be set to can
be outdated by the time we actually set it, unless we do all of
fsm_search_avail() under some form of exclusive lock - clearly not something
desirable.
Greetings,
Andres Freund
pgsql-hackers by date: