Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Buffer locking is special (hints, checksums, AIO writes) |
Date | |
Msg-id | 20250827191441.1c.nmisch@google.com Whole thread Raw |
In response to | Re: Buffer locking is special (hints, checksums, AIO writes) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Buffer locking is special (hints, checksums, AIO writes)
|
List | pgsql-hackers |
On Wed, Aug 27, 2025 at 12:18:27PM -0400, Andres Freund wrote: > On 2025-08-26 17:14:49 -0700, Noah Misch wrote: > > On Fri, Aug 22, 2025 at 03:44:48PM -0400, Andres Freund wrote: > > > == Problem 3 - Cacheline contention == > > > > > c) Read accesses to the BufferDesc cause contention > > > > > > Some code, like nbtree, relies on functions like > > > BufferGetBlockNumber(). Unfortunately that contends with concurrent > > > modifications of the buffer descriptor (like pinning). Potential solutions > > > are to rely less on functions like BufferGetBlockNumber() or to split out > > > the memory for that into a separate (denser?) array. > > > > Agreed. BufferGetBlockNumber() could even use a new local (non-shmem) data > > structure, since the buffer's mapping can't change until we unpin. > > Hm. I didn't think about a backend local datastructure for that, perhaps > because it seems not cheap to maintain (both from a runtime and a space > perspective). Yes, paying off the cost of maintaining it could be tricky. It could be the kind of thing where the overhead loses at 10 cores and wins at 40 cores. It could also depend heavily on the workload's concurrent pins per backend. > If we store the read-only data for buffers separately from the read-write > data, we could access that from backends without a lock, since it can't change > with the buffer pinned. Good point. That alone may be enough of a win. > One way to do that would be to maintain a back-pointer from the BufferDesc to > the BufferLookupEnt, since the latter *already* contains the BufferTag. We > probably don't want to add another indirection to the buffer mapping hash > table, otherwise we could deduplicate the other way round and just put padding > between the modified and read-only part of a buffer desc. I think you're saying clients would save the back-pointer once and dereference it many times, with each dereference of a saved back-pointer avoiding a shmem read of BufferDesc.tag. Is that right? > > On Tue, Aug 26, 2025 at 05:00:13PM -0400, Andres Freund wrote: > > > On 2025-08-26 16:21:36 -0400, Robert Haas wrote: > > > > On Fri, Aug 22, 2025 at 3:45 PM Andres Freund <andres@anarazel.de> wrote: > > > > > DOES ANYBODY HAVE A BETTER NAME THAN SHARE-EXCLUSIVE???!? > > > I would consider {AccessShare, Exclusive, AccessExclusive}. > > One thing I forgot to mention is that with the proposed re-architecture in > place, we could subsequently go further and make pinning just be a very > lightweight lock level, instead of that being a separate dedicated > infrstructure. One nice outgrowth of that would be that that acquiring a > cleanup lock would just be a real lock acquisition, instead of the dedicated > limited machinery we have right now. > > Which would leave us with: > - reference (pins today) > - share > - share-exclusive > - exclusive > - cleanup > > This doesn't quite seem to map onto the heavyweight lock levels in a sensible > way... Could map it like this: AccessShare - pins today RowShare - check tuple visibility (BUFFER_LOCK_SHARE today) Share - set hint bits ShareUpdateExclusive - clean/write out (borrowing Robert's idea) Exclusive - add tuples, change xmax, etc. (BUFFER_LOCK_EXCLUSIVE today) AccessExclusive - cleanup lock or evict the buffer That has a separate level for hint bits vs. I/O, so multiple backends could set hint bits. I don't know whether the benchmarks would favor maintaining that distinction. > > What the $SUBJECT proposal calls SHARE-EXCLUSIVE would become Exclusive. > > There are a few hundred references to the lock levels though, seems painful to > rename them :( Yes, especially in comments and extensions. Likely more important than that for the long-term, your latest proposal has the advantage of keeping short names for the most-commonly-referenced lock types. (We could keep BUFFER_LOCK_SHARE with the lower layers translating that into RowShare, but that weakens or eliminates the benefit of reducing what readers need to learn.) For what it's worth, 6 PGXN modules reference BUFFER_LOCK_SHARE and/or BUFFER_LOCK_EXCLUSIVE. Compared to share-exclusive, I think I'd prefer a name that describes the use cases, "set-hints-or-write" (or separate "write" and "set-hints" levels). What do you think of that? I don't know whether that should win vs. names like ShareUpdateExclusive, though.
pgsql-hackers by date: