Thread: MAX_BACKENDS size (comment accuracy)

MAX_BACKENDS size (comment accuracy)

From
Jacob Brazeal
Date:
Hello all,

In lwlocks.c, we have the following comment, related to LWLock state:

/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))


However, MAX_BACKENDS is set to 2^18-1. Here is the comment in postmaster.h:

/*
 * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
 * for buffer references in buf_internals.h.  This limitation could be lifted
 * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
 * backends exceed currently realistic configurations. Even if that limitation
 * were removed, we still could not a) exceed 2^23-1 because inval.c stores
 * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
 * compute 4*MaxBackends without any overflow check.  This is rechecked in the
 * relevant GUC check hooks and in RegisterBackgroundWorker().
 */
#define MAX_BACKENDS 0x3FFFF


2^23-1 is noted as an additional upper limit, but I wonder if it'd be correct to update the comment in lwlocks.c to 

/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */

I'm not sure if this could lead to us actually saving some bits in the lwlock state, or if we could do anything useful with them anyway.

Jacob

Re: MAX_BACKENDS size (comment accuracy)

From
Jacob Brazeal
Date:
While we are on the topic of comments from lwlock.c, there is one other one that confused me, in LWLockWaitListLock:

  /* and then spin without atomic operations until lock is released */
{
SpinDelayStatus delayStatus;

init_local_spin_delay(&delayStatus);

while (old_state & LW_FLAG_LOCKED)
{
perform_spin_delay(&delayStatus);
old_state = pg_atomic_read_u32(&lock->state);
}
#ifdef LWLOCK_STATS
delays += delayStatus.delays;
#endif
finish_spin_delay(&delayStatus);
}


It seems that we are using an atomic operation in the loop (though, no compare-and-set, etc.) I might be mis-reading the intent of the comment, but I'm curious if there's a way to reword it, too.

On Sat, Jan 25, 2025 at 4:06 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:
Thinking a bit further about this, the purpose of the LW_SHARED_MASK section of the state is to count the number of lock-sharers. Thus, we only care about the actual number of backends (up to 2^18-1) here and not the size of the ProcNumber data type. So I do think the comment should read 2^18-1 and not 2^23-1. Here is a patch to that effect.

On Sat, Jan 25, 2025 at 3:21 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:
Hello all,

In lwlocks.c, we have the following comment, related to LWLock state:

/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))


However, MAX_BACKENDS is set to 2^18-1. Here is the comment in postmaster.h:

/*
 * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
 * for buffer references in buf_internals.h.  This limitation could be lifted
 * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
 * backends exceed currently realistic configurations. Even if that limitation
 * were removed, we still could not a) exceed 2^23-1 because inval.c stores
 * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
 * compute 4*MaxBackends without any overflow check.  This is rechecked in the
 * relevant GUC check hooks and in RegisterBackgroundWorker().
 */
#define MAX_BACKENDS 0x3FFFF


2^23-1 is noted as an additional upper limit, but I wonder if it'd be correct to update the comment in lwlocks.c to 

/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */

I'm not sure if this could lead to us actually saving some bits in the lwlock state, or if we could do anything useful with them anyway.

Jacob

Re: MAX_BACKENDS size (comment accuracy)

From
Andres Freund
Date:
Hi,

On 2025-01-25 16:06:29 -0800, Jacob Brazeal wrote:
> > In lwlocks.c, we have the following comment, related to LWLock state:
> >
> >
> > */* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine.
> > */#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))*
> >
> > However, MAX_BACKENDS is set to 2^18-1. Here is the comment in
> > postmaster.h:

> > */* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width
> > reserved * for buffer references in buf_internals.h.  This limitation could
> > be lifted * by using a 64bit state; but it's unlikely to be worthwhile as
> > 2^18-1 * backends exceed currently realistic configurations. Even if that
> > limitation * were removed, we still could not a) exceed 2^23-1 because
> > inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4
> > because some places * compute 4*MaxBackends without any overflow check.
> > This is rechecked in the * relevant GUC check hooks and in
> > RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF*
> >
> > 2^23-1 is noted as an additional upper limit, but I wonder if it'd be
> > correct to update the comment in lwlocks.c to

> Thinking a bit further about this, the purpose of the LW_SHARED_MASK
> section of the state is to count the number of lock-sharers. Thus, we only
> care about the actual number of backends (up to 2^18-1) here and not the
> size of the ProcNumber data type. So I do think the comment should read
> 2^18-1 and not 2^23-1. Here is a patch to that effect.

At the time that comment was written MAX_BACKENDS was 2^23-1. Alexander and I
lowered MAX_BACKENDS in 48354581a49c to 2^18-1 and apparently didn't think
about the comment in lwlock.h.

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date:   2016-04-10 20:12:32 -0700

    Allow Pin/UnpinBuffer to operate in a lockfree manner.
...

    To allow atomic operations to be used, cram BufferDesc's flags,
    usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
    that allows to manipulate them together using 32bit compare-and-swap
    operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
    be lifted by using a 64bit field, but it's not a realistic configuration
    atm).



> > */* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */*
> >
> > I'm not sure if this could lead to us actually saving some bits in the
> > lwlock state, or if we could do anything useful with them anyway.

It's not quite enough bits to be useful I think. If we could free 16 bits (or
we defined LWLock.tranche to be narrower), we could store the tranche as part
of the atomic, and save 4 bytes (2 bytes of alignment padding, 2 bytes for the
tranche). But unless we reduce the size of the tranche field a decent bit
there's not enough space.

I'd really like to reduce the size of struct LWLock, but I think that'll need
a bit more changes. I think we should use a single 64bit atomic and store the
list of waiters inside the atomic.

flags: 3
exclusively locked: 1 bit
share locks: 18 bits
head of wait list: 18 bits (proc number offset)
tail of wait list: 18 bits (proc number offset)
= 58

That doesn't leave enough space for the tranche unfortunately. But if we
reduced MAX_BACKENDS to 2^16-1 and reduced the size of the tranche to to 12
bits, it'd work!

I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
far from that being a realistic limit.  Halfing the size of LWLock and laying
the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

Anyway, that's really just a periodic rant / project suggestion...


> diff --git a/src/backend/storage/lmgr/lwlock.c
> b/src/backend/storage/lmgr/lwlock.c index 2f558ffea1..d3a2619072 100644 ---
> a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c
> @@ -99,7 +99,7 @@ #define LW_VAL_SHARED 1
>
>  #define LW_LOCK_MASK                ((uint32) ((1 << 25)-1))
> -/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
> +/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */
>  #define LW_SHARED_MASK                ((uint32) ((1 << 24)-1))
>
>  StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,

I'm inclined to think that we should adjust the various constants at the same
time as the comment? It's imo somewhat weird to have bits LW_SHARED_MASK that
can never be set...

I wonder if we should instead define LW_VAL_EXCLUSIVE and LW_SHARED_MASK using
MAX_BACKENDS and have a static assertion ensuring it doesn't overlap with flag
bits?  That way we don't have two sets of defines to keep in sync.

Greetings,

Andres Freund



Re: MAX_BACKENDS size (comment accuracy)

From
Andres Freund
Date:
Hi,

On 2025-01-25 23:35:51 -0800, Jacob Brazeal wrote:
> While we are on the topic of comments from lwlock.c, there is one other one
> that confused me, in LWLockWaitListLock:

> *  /* and then spin without atomic operations until lock is released */ {
> SpinDelayStatus delayStatus; init_local_spin_delay(&delayStatus); while
> (old_state & LW_FLAG_LOCKED) { perform_spin_delay(&delayStatus); old_state
> = pg_atomic_read_u32(&lock->state); }#ifdef LWLOCK_STATS delays +=
> delayStatus.delays;#endif finish_spin_delay(&delayStatus); }*
> 
> It seems that we *are* using an atomic operation in the loop (though, no
> compare-and-set, etc.) I might be mis-reading the intent of the comment,
> but I'm curious if there's a way to reword it, too.

It's not really an atomic operation. It's just reading an atomic variable
(which just guarantees that the compiler isn't eliding the read and that the
read isn't torn).  Personally I don't think there's a need to rephrase the
comment, but I probably wrote it, so take that with a grain of salt.

Greetings,

Andres Freund

PS: FYI, this list values properly quoting messages instead of replying ontop
of the entire quoted messages.



Re: MAX_BACKENDS size (comment accuracy)

From
Jacob Brazeal
Date:
I find I didn't send the previous reply to the mailing list, so I'll copy it here.
---
The patch series looks good. It looks like this currently leaves 10 bits of unused space (bits 20 - 29) in the state.

> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
>     "MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.

> I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
> far from that being a realistic limit.  Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...

Neat. The current implementation of queuing does seem pretty heavy, and I'd have time to work on a lock-free version. It seems like the waitlist state itself could be managed similarly to LWLockAttemptLock, with an atomic compare-and-set. I'm not quite sure how to manage the full proclist queue, since only the head and tail would actually be part of the LWLock; would we need to do something like copy the whole list, add our process to the copied queue, and then swap out the reference to the new list in the LWLock?

> PS: FYI, this list values properly quoting messages instead of replying on top
> of the entire quoted messages.

Oops, thank you for the heads up. Hopefully this reply is formatted correctly, I'm still getting the hang of things.


Regards,
Jacob


On Sun, Jan 26, 2025 at 12:41 PM Jacob Brazeal <jacob.brazeal@gmail.com> wrote:
The patch series looks good. It looks like this currently leaves 10 bits of unused space (bits 20 - 29) in the state.

> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
>     "MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.

> I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
> far from that being a realistic limit.  Halfing the size of LWLock and laying
> the ground work for making the wait-list lock-free imo would be very well
> worth the reduction in an unrealistic limit...

Neat. The current implementation of queuing does seem pretty heavy, and I'd have time to work on a lock-free version. It seems like the waitlist state itself could be managed similarly to LWLockAttemptLock, with an atomic compare-and-set. I'm not quite sure how to manage the full proclist queue, since only the head and tail would actually be part of the LWLock; would we need to do something like copy the whole list, add our process to the copied queue, and then swap out the reference to the new list in the LWLock?

> PS: FYI, this list values properly quoting messages instead of replying on top
> of the entire quoted messages.

Oops, thank you for the heads up. Hopefully this reply is formatted correctly, I'm still getting the hang of things.


Regards,
Jacob

Re: MAX_BACKENDS size (comment accuracy)

From
Jacob Brazeal
Date:
Looking at v1-0003-WIP-Base-LWLock-limits-directly-on-MAX_BACKENDS.patch, I'm curious about the following assert;

> #define LW_VAL_EXCLUSIVE (MAX_BACKENDS + 1)
...
> StaticAssertDecl(MAX_BACKENDS < LW_VAL_EXCLUSIVE,
  "MAX_BACKENDS too big for lwlock.c");

Since LW_VAL_EXCLUSIVE is already defined as MAX_BACKENDS + 1, is this basically just checking for an integer overflow?

Re: MAX_BACKENDS size (comment accuracy)

From
Jacob Brazeal
Date:
I had a typo earlier: I should have said:

> StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
>     "MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that  LW_LOCK_MASK & LW_FLAG_MASK == 0? To also ensure the LW_VAL_EXCLUSIVE bit does not overlap.