Re: Add LWLock blocker(s) information - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Add LWLock blocker(s) information
Date
Msg-id CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com
Whole thread Raw
In response to Re: Add LWLock blocker(s) information  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Wed, Nov 18, 2020 at 5:25 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 11/08/2020 03:41, Andres Freund wrote:
> On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
>> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>> the patch adds into the LWLock struct:
>>>
>>>                      last_holding_pid: last pid owner of the lock
>>>                      last_mode: last holding mode of the last pid owner of the lock
>>>                      nholders: number of holders (could be >1 in case of LW_SHARED)
>>
>> There's been significant work done over the years to get the size of
>> an LWLock down; I'm not very enthusiastic about making it bigger
>> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
>> which embeds one of the LWLocks associated with a BufferDesc into the
>> structure to reduce the number of cache lines associated with common
>> buffer operations. I'm not sure whether this patch would increase the
>> space usage of a BufferDesc to more than one cache line again, but at
>> the very least it would make it a lot tighter, since it looks like it
>> adds 12 bytes to the size of each one.
>
> +many. If anything I would like to make them *smaller*. We should strive
> to make locking more and more granular, and that requires the space
> overhead to be small.  I'm unhappy enough about the tranche being in
> there, and requiring padding etc.
>
> I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
> on size or efficiency.

That seems to be the consensus, so I'm marking this as Returned with
Feeback in the commitfest.

For what it's worth, I think that things like this are where we can really benefit from external tracing and observation tools.

Instead of tracking the information persistently in the LWLock struct, we can emit TRACE_POSTGRESQL_LWLOCK_BLOCKED_ON(...) in a context where we have the information available to us, then forget all about it. We don't spend anything unless someone's collecting the info.

If someone wants to track LWLock blocking relationships during debugging and performance work, they can use systemtap, dtrace, bpftrace, or a similar tool to observe the LWLock tracepoints and generate stats on LWLock blocking frequencies/durations. Doing so with systemtap should be rather simple.

I actually already had a look at this before. I found that the tracepoints that're in the LWLock code right now don't supply enough information in their arguments so you have to use DWARF debuginfo based probes, which is a pain. The tranche name alone doesn't let you identify which lock within a tranche is the current target.

I've attached a patch that adds the actual LWLock* to each tracepoint in the LWLock subsystem. That permits different locks to be tracked when handling tracepoint events within a single process.

Another patch adds tracepoints that were missing from LWLockUpdateVar and LWLockWaitForVar. And another removes a stray TRACE_POSTGRESQL_LWLOCK_ACQUIRE() in LWLockWaitForVar() which should not have been there, since the lock is not actually acquired by LWLockWaitForVar().

I'd hoped to add some sort of "index within the tranche" to tracepoints, but it looks like it's not feasible. It turns out to be basically impossible to get a stable identifier for an individual LWLock that is valid across different backends. A LWLock inside a DSM segment might have a different address in different backends depending on where the DSM segment got mapped. The LWLock subsystem doesn't keep track of them and doesn't have any way to map a LWLock pointer to any sort of cross-process-valid identifier. So that's a bit of a pain when tracing. To provide something stable I think  it'd be necessary to add some kind of counter tracked per-tranche and set by LWLockInitialize in the LWLock struct itself, which we sure don't want to do. If this ever becomes important for some reason we can probably look up whether the address is within a DSM segment or static shmem and compute some kind of relative address to report. For now you can't identify and compare individual locks within a tranche except for individual locks and named tranches.


By the way, the LWLock tracepoints currently fire T_NAME(lock) which calls GetLWTrancheName() for each tracepoint hit, so long as Pg is built with --enable-dtrace, even when nothing is actually tracing them. We might want to consider guarding them in systemtap tracepoint semaphore tests so they just become a predicted-away branch when not active. Doing so requires a small change to how we compile probes.d and the related makefile, but shouldn't be too hard. I haven't done that in this patch set.

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: [PATCH] LWLock self-deadlock detection
Next
From: Heikki Linnakangas
Date:
Subject: Re: Deduplicate aggregates and transition functions in planner