Thread: Improve LWLock tranche name visibility across backends

Improve LWLock tranche name visibility across backends

From
Sami Imseih
Date:
Hi,

This is a follow-up to a discussion started in [0].

LWLocks in PostgreSQL are categorized into tranches, and the tranche name
appears as the wait_event in pg_stat_activity. There are both built-in
tranche names and tranche names that can be registered by extensions using
RequestNamedLWLockTranche() or LWLockRegisterTranche().

Tranche names are stored in process-local memory when registered. If a
tranche is registered during postmaster startup, such as with built-in
tranches or those registered via RequestNamedLWLockTranche(), its name is
inherited by backend processes via fork(). However, if a tranche is
registered dynamically by a backend using LWLockRegisterTranche(), other
backends will not be aware of it unless they explicitly register it as well.

Consider a case in which an extension allows a backend to attach a new
dshash via the GetNamedDSHash API and supplies a tranche name like
"MyUsefulExtension". The first backend to call GetNamedDSHash will
initialize an LWLock using the extension-defined tranche name and associate
it with a tranche ID in local memory. Other backends that later attach to
the same dshash will also learn about the tranche name and ID. Backends
that do not attach the dshash will not know this tranche name. This
results in differences in how wait events are reported in pg_stat_activity.

When querying pg_stat_activity, the function pgstat_get_wait_event is
called, which internally uses GetLWLockIdentifier and GetLWTrancheName
to map the LWLock to its tranche name. If the backend does not recognize
the tranche ID, a fallback name "extension" is used. Therefore, backends
that have registered the tranche will report the correct extension-defined
tranche name, while others will report the generic fallback of "extension".

i.e.
````
postgres=# select wait_event, wait_event_type from pg_stat_activity;
-[ RECORD 1 ]---+--------------------
wait_event      | extension
wait_event_type | LWLock
```
instead of
```
postgres=# select wait_event, wait_event_type from pg_stat_activity;
-[ RECORD 1 ]---+--------------------
wait_event      | MyUsefulExtension
wait_event_type | LWLock
```

This is the current design, but I think we can do better to avoid inconsitencies
this my lead for monitoring tools and diagnostics.

To improve this, we could store tranche names registered by a normal backend
in shared memory, for example in a dshash, allowing tranche names to be
resolved even by backends that have not explicitly registered them. This
would lead to more consistent behavior, particularly as more extensions
adopt APIs like GetNamedDSHash, where tranche names are registered by the
backend rather than the postmaster.

Attached is a proof of concept that does not alter the
LWLockRegisterTranche API. Instead, it detects when a registration is
performed by a normal backend and stores the tranche name in shared memory,
using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
the order of built-in names, the local list, and finally the shared memory.
The fallback name "extension" can still be returned if an extension does
not register a tranche.

An exclusive lock is taken when adding a new tranche, which should be a rare
occurrence. A shared lock is taken when looking up a tranche name via
GetLWTrancheName.

There are still some open questions I have:

1/ There is currently no mechanism for deleting entries. I am not sure whether
this is a concern, since the size of the table would grow only with the
number of extensions and the number of LWLocks they initialize, which is
typically small. That said, others may have different thoughts on this.

2/ What is the appropriate size limit for a tranche name. The work done
in [0] caps the tranche name to 128 bytes for the dshash tranche, and
128 bytes + length of " DSA" suffix for the dsa tranche. Also, the
existing RequestNamedLWLockTranche caps the name to NAMEDATALEN. Currently,
LWLockRegisterTranche does not have a limit on the tranche name. I wonder
if we also need to take care of this and implement some common limit that
applies to tranch names regardless of how they're created?

[0] https://www.postgresql.org/message-id/aEiTzmndOVPmA6Mm%40nathan

--

Sami Imseih
Amazon Web Services (AWS)

Attachment

Re: Improve LWLock tranche name visibility across backends

From
Bertrand Drouvot
Date:
Hi,

On Wed, Jul 09, 2025 at 04:39:48PM -0500, Sami Imseih wrote:
> Hi,
> 
> When querying pg_stat_activity, the function pgstat_get_wait_event is
> called, which internally uses GetLWLockIdentifier and GetLWTrancheName
> to map the LWLock to its tranche name. If the backend does not recognize
> the tranche ID, a fallback name "extension" is used. Therefore, backends
> that have registered the tranche will report the correct extension-defined
> tranche name, while others will report the generic fallback of "extension".
> 
> i.e.
> ````
> postgres=# select wait_event, wait_event_type from pg_stat_activity;
> -[ RECORD 1 ]---+--------------------
> wait_event      | extension
> wait_event_type | LWLock
> ```
> instead of
> ```
> postgres=# select wait_event, wait_event_type from pg_stat_activity;
> -[ RECORD 1 ]---+--------------------
> wait_event      | MyUsefulExtension
> wait_event_type | LWLock
> ```
> 
> This is the current design, but I think we can do better to avoid inconsitencies
> this my lead for monitoring tools and diagnostics.

+1 on finding a way to improve this, thanks for looking at it.

> Attached is a proof of concept that does not alter the
> LWLockRegisterTranche API. Instead, it detects when a registration is
> performed by a normal backend and stores the tranche name in shared memory,
> using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> the order of built-in names, the local list, and finally the shared memory.
> The fallback name "extension" can still be returned if an extension does
> not register a tranche.

I did not look in details, but do you think we could make use of
WaitEventCustomNew()?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

From
Sami Imseih
Date:
Thanks for the feedback!

> > Attached is a proof of concept that does not alter the
> > LWLockRegisterTranche API. Instead, it detects when a registration is
> > performed by a normal backend and stores the tranche name in shared memory,
> > using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> > the order of built-in names, the local list, and finally the shared memory.
> > The fallback name "extension" can still be returned if an extension does
> > not register a tranche.
>
> I did not look in details, but do you think we could make use of
> WaitEventCustomNew()?

It looks like I overlooked the custom wait event, so I didn’t take it into
account initially. That said, I do think it’s reasonable to consider
piggybacking on this infrastructure.

After all, LWLockRegisterTranche is already creating a custom wait event
defined by the extension. The advantage here is that we can avoid creating
new shared memory and instead reuse the existing static hash table, which is
capped at 128 custom wait events:

```
#define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE 128
```

However, WaitEventCustomNew as it currently stands won’t work for our use
case, since it assigns an eventId automatically. The API currently takes a
classId and wait_event_name, but in our case, we’d actually want to pass in a
trancheId.

So, we might need a new API, something like:
```
WaitEventCustomNewWithEventId(uint32 classId, uint16 eventId,
const char *wait_event_name);
```
eventId in the LWLock case will be a tracheId that was generated
by the user in some earlier step, like LWLockInitialize

This would behave the same as the existing WaitEventCustomNew API,
except that it uses the provided eventId.

or maybe we can just allow WaitEventCustomNew to take in the eventId, and
if it's > 0, then use the passed in value, otherwise generate the next eventId.

I do like the latter approach more, what do you think?

With this API, we can then teach LWLockRegisterTranche to register the
custom wait event.

--
Sami