Thread: MAX_BACKENDS size (comment accuracy)
Hello all,
In lwlocks.c, we have the following comment, related to LWLock state:
/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))
However, MAX_BACKENDS is set to 2^18-1. Here is the comment in postmaster.h:
/*
* Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
* for buffer references in buf_internals.h. This limitation could be lifted
* by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
* backends exceed currently realistic configurations. Even if that limitation
* were removed, we still could not a) exceed 2^23-1 because inval.c stores
* the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
* compute 4*MaxBackends without any overflow check. This is rechecked in the
* relevant GUC check hooks and in RegisterBackgroundWorker().
*/
#define MAX_BACKENDS 0x3FFFF
* Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
* for buffer references in buf_internals.h. This limitation could be lifted
* by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
* backends exceed currently realistic configurations. Even if that limitation
* were removed, we still could not a) exceed 2^23-1 because inval.c stores
* the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
* compute 4*MaxBackends without any overflow check. This is rechecked in the
* relevant GUC check hooks and in RegisterBackgroundWorker().
*/
#define MAX_BACKENDS 0x3FFFF
2^23-1 is noted as an additional upper limit, but I wonder if it'd be correct to update the comment in lwlocks.c to
/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */
I'm not sure if this could lead to us actually saving some bits in the lwlock state, or if we could do anything useful with them anyway.
Jacob
While we are on the topic of comments from lwlock.c, there is one other one that confused me, in LWLockWaitListLock:
/* and then spin without atomic operations until lock is released */
{
SpinDelayStatus delayStatus;
init_local_spin_delay(&delayStatus);
while (old_state & LW_FLAG_LOCKED)
{
perform_spin_delay(&delayStatus);
old_state = pg_atomic_read_u32(&lock->state);
}
#ifdef LWLOCK_STATS
delays += delayStatus.delays;
#endif
finish_spin_delay(&delayStatus);
}
{
SpinDelayStatus delayStatus;
init_local_spin_delay(&delayStatus);
while (old_state & LW_FLAG_LOCKED)
{
perform_spin_delay(&delayStatus);
old_state = pg_atomic_read_u32(&lock->state);
}
#ifdef LWLOCK_STATS
delays += delayStatus.delays;
#endif
finish_spin_delay(&delayStatus);
}
It seems that we are using an atomic operation in the loop (though, no compare-and-set, etc.) I might be mis-reading the intent of the comment, but I'm curious if there's a way to reword it, too.
On Sat, Jan 25, 2025 at 4:06 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:
Thinking a bit further about this, the purpose of the LW_SHARED_MASK section of the state is to count the number of lock-sharers. Thus, we only care about the actual number of backends (up to 2^18-1) here and not the size of the ProcNumber data type. So I do think the comment should read 2^18-1 and not 2^23-1. Here is a patch to that effect.On Sat, Jan 25, 2025 at 3:21 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:Hello all,In lwlocks.c, we have the following comment, related to LWLock state:/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))However, MAX_BACKENDS is set to 2^18-1. Here is the comment in postmaster.h:/*
* Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
* for buffer references in buf_internals.h. This limitation could be lifted
* by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
* backends exceed currently realistic configurations. Even if that limitation
* were removed, we still could not a) exceed 2^23-1 because inval.c stores
* the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
* compute 4*MaxBackends without any overflow check. This is rechecked in the
* relevant GUC check hooks and in RegisterBackgroundWorker().
*/
#define MAX_BACKENDS 0x3FFFF2^23-1 is noted as an additional upper limit, but I wonder if it'd be correct to update the comment in lwlocks.c to/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */I'm not sure if this could lead to us actually saving some bits in the lwlock state, or if we could do anything useful with them anyway.Jacob
Hi, On 2025-01-25 16:06:29 -0800, Jacob Brazeal wrote: > > In lwlocks.c, we have the following comment, related to LWLock state: > > > > > > */* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. > > */#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))* > > > > However, MAX_BACKENDS is set to 2^18-1. Here is the comment in > > postmaster.h: > > */* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width > > reserved * for buffer references in buf_internals.h. This limitation could > > be lifted * by using a 64bit state; but it's unlikely to be worthwhile as > > 2^18-1 * backends exceed currently realistic configurations. Even if that > > limitation * were removed, we still could not a) exceed 2^23-1 because > > inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 > > because some places * compute 4*MaxBackends without any overflow check. > > This is rechecked in the * relevant GUC check hooks and in > > RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF* > > > > 2^23-1 is noted as an additional upper limit, but I wonder if it'd be > > correct to update the comment in lwlocks.c to > Thinking a bit further about this, the purpose of the LW_SHARED_MASK > section of the state is to count the number of lock-sharers. Thus, we only > care about the actual number of backends (up to 2^18-1) here and not the > size of the ProcNumber data type. So I do think the comment should read > 2^18-1 and not 2^23-1. Here is a patch to that effect. At the time that comment was written MAX_BACKENDS was 2^23-1. Alexander and I lowered MAX_BACKENDS in 48354581a49c to 2^18-1 and apparently didn't think about the comment in lwlock.h. commit 48354581a49c30f5757c203415aa8412d85b0f70 Author: Andres Freund <andres@anarazel.de> Date: 2016-04-10 20:12:32 -0700 Allow Pin/UnpinBuffer to operate in a lockfree manner. ... To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). > > */* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */* > > > > I'm not sure if this could lead to us actually saving some bits in the > > lwlock state, or if we could do anything useful with them anyway. It's not quite enough bits to be useful I think. If we could free 16 bits (or we defined LWLock.tranche to be narrower), we could store the tranche as part of the atomic, and save 4 bytes (2 bytes of alignment padding, 2 bytes for the tranche). But unless we reduce the size of the tranche field a decent bit there's not enough space. I'd really like to reduce the size of struct LWLock, but I think that'll need a bit more changes. I think we should use a single 64bit atomic and store the list of waiters inside the atomic. flags: 3 exclusively locked: 1 bit share locks: 18 bits head of wait list: 18 bits (proc number offset) tail of wait list: 18 bits (proc number offset) = 58 That doesn't leave enough space for the tranche unfortunately. But if we reduced MAX_BACKENDS to 2^16-1 and reduced the size of the tranche to to 12 bits, it'd work! I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're far from that being a realistic limit. Halfing the size of LWLock and laying the ground work for making the wait-list lock-free imo would be very well worth the reduction in an unrealistic limit... Anyway, that's really just a periodic rant / project suggestion... > diff --git a/src/backend/storage/lmgr/lwlock.c > b/src/backend/storage/lmgr/lwlock.c index 2f558ffea1..d3a2619072 100644 --- > a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c > @@ -99,7 +99,7 @@ #define LW_VAL_SHARED 1 > > #define LW_LOCK_MASK ((uint32) ((1 << 25)-1)) > -/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ > +/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */ > #define LW_SHARED_MASK ((uint32) ((1 << 24)-1)) > > StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS, I'm inclined to think that we should adjust the various constants at the same time as the comment? It's imo somewhat weird to have bits LW_SHARED_MASK that can never be set... I wonder if we should instead define LW_VAL_EXCLUSIVE and LW_SHARED_MASK using MAX_BACKENDS and have a static assertion ensuring it doesn't overlap with flag bits? That way we don't have two sets of defines to keep in sync. Greetings, Andres Freund
Hi, On 2025-01-25 23:35:51 -0800, Jacob Brazeal wrote: > While we are on the topic of comments from lwlock.c, there is one other one > that confused me, in LWLockWaitListLock: > * /* and then spin without atomic operations until lock is released */ { > SpinDelayStatus delayStatus; init_local_spin_delay(&delayStatus); while > (old_state & LW_FLAG_LOCKED) { perform_spin_delay(&delayStatus); old_state > = pg_atomic_read_u32(&lock->state); }#ifdef LWLOCK_STATS delays += > delayStatus.delays;#endif finish_spin_delay(&delayStatus); }* > > It seems that we *are* using an atomic operation in the loop (though, no > compare-and-set, etc.) I might be mis-reading the intent of the comment, > but I'm curious if there's a way to reword it, too. It's not really an atomic operation. It's just reading an atomic variable (which just guarantees that the compiler isn't eliding the read and that the read isn't torn). Personally I don't think there's a need to rephrase the comment, but I probably wrote it, so take that with a grain of salt. Greetings, Andres Freund PS: FYI, this list values properly quoting messages instead of replying ontop of the entire quoted messages.
I find I didn't send the previous reply to the mailing list, so I'll copy it here.
---
The patch series looks good. It looks like this currently leaves 10 bits of unused space (bits 20 - 29) in the state.
> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
> "MAX_BACKENDS and LW_FLAG_MASK overlap");
> "MAX_BACKENDS and LW_FLAG_MASK overlap");
Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.
> I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
> far from that being a realistic limit. Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...
> far from that being a realistic limit. Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...
Neat. The current implementation of queuing does seem pretty heavy, and I'd have time to work on a lock-free version. It seems like the waitlist state itself could be managed similarly to LWLockAttemptLock, with an atomic compare-and-set. I'm not quite sure how to manage the full proclist queue, since only the head and tail would actually be part of the LWLock; would we need to do something like copy the whole list, add our process to the copied queue, and then swap out the reference to the new list in the LWLock?
> PS: FYI, this list values properly quoting messages instead of replying on top
> of the entire quoted messages.
Oops, thank you for the heads up. Hopefully this reply is formatted correctly, I'm still getting the hang of things.
Regards,
Jacob
On Sun, Jan 26, 2025 at 12:41 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:
The patch series looks good. It looks like this currently leaves 10 bits of unused space (bits 20 - 29) in the state.> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
> "MAX_BACKENDS and LW_FLAG_MASK overlap");Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.> I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
> far from that being a realistic limit. Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...Neat. The current implementation of queuing does seem pretty heavy, and I'd have time to work on a lock-free version. It seems like the waitlist state itself could be managed similarly to LWLockAttemptLock, with an atomic compare-and-set. I'm not quite sure how to manage the full proclist queue, since only the head and tail would actually be part of the LWLock; would we need to do something like copy the whole list, add our process to the copied queue, and then swap out the reference to the new list in the LWLock?> PS: FYI, this list values properly quoting messages instead of replying on top> of the entire quoted messages.Oops, thank you for the heads up. Hopefully this reply is formatted correctly, I'm still getting the hang of things.Regards,Jacob
Looking at v1-0003-WIP-Base-LWLock-limits-directly-on-MAX_BACKENDS.patch, I'm curious about the following assert;
> #define LW_VAL_EXCLUSIVE (MAX_BACKENDS + 1)
...> StaticAssertDecl(MAX_BACKENDS < LW_VAL_EXCLUSIVE,
"MAX_BACKENDS too big for lwlock.c");Since LW_VAL_EXCLUSIVE is already defined as MAX_BACKENDS + 1, is this basically just checking for an integer overflow?
I had a typo earlier: I should have said:
> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
> "MAX_BACKENDS and LW_FLAG_MASK overlap");
> "MAX_BACKENDS and LW_FLAG_MASK overlap");
Should this check that LW_LOCK_MASK & LW_FLAG_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.