Re: [PATCH] Identify LWLocks in tracepoints - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [PATCH] Identify LWLocks in tracepoints
Date
Msg-id CAGRY4nz-0dwJQdRucPi-q1j+FwT1ESYyR1VkXrOpdo4uzyQBaQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Identify LWLocks in tracepoints  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
On Wed, 13 Jan 2021 at 19:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock
> analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

    /* Re-initialization of individual LWLocks is not permitted */
    Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

There should be no reason for anything to call LWLockInitialize(...) on an individual LWLock, since they are all initialized during postmaster startup.

Doing so must be a bug.

But that's a trivial change that can be done separately.
 
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
> it possible to differentiate between individual locks within a tranche, and
> (so long as they aren't tranches in a DSM segment) compare locks between
> processes. That means you can do lock-order analysis etc, which was not
> previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

If code-path 1 does

    LWLockAcquire(LockA, LW_EXCLUSIVE);
    ...
    LWLockAcquire(LockB, LW_EXCLUSIVE);

and code-path 2 does:

    LWLockAcquire(LockB, LW_EXCLUSIVE);
    ...
    LWLockAcquire(LockA, LW_EXCLUSIVE);
 
then they're subject to deadlock. But you might not actually hit that often in test workloads if the timing required for the deadlock to occur is tight and/or occurs on infrequent operations.

It's not always easy to reason about or prove things about lock order when they're potentially nested deep within many layers of other calls and callbacks. Obviously something we try to avoid with LWLocks, but not impossible.

If you trace a workload and derive all possible nestings of lock acquire order, you can then prove things about whether there are any possible ordering conflicts and where they might arise.

A PoC to do so is on my TODO.

> Traces also don't have to do userspace reads for the tranche name all
> the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

I don't have numbers on that. Whether it matters will depend way too much on how you're using the probe points and collecting/consuming the data anyway.

It's a bit unfortunate (IMO) that we make a function call for each tracepoint invocation to get the tranche names. Ideally I'd prefer to be able to omit the tranche names lookups for these probes entirely for something as hot as LWLocks. But it's a bit of a pain to look up the tranche names from an external trace tool, so instead I'm inclined to see if we can enable systemtap's semaphores and only compute the tranche name if the target probe is actually enabled. But that'd be separate to this patch and require a build change in how systemtap support is compiled and linked.

BTW, a user->kernel->user context switch only occurs when the trace tool's probes use kernel space - such as for perf based probes, or for systemtap's kernel-runtime probes. The same markers can be used by e.g. systemtap's "dyninst" runtime that runs entirely in userspace.


pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Protect syscache from bloating with negative cache entries
Next
From: Craig Ringer
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints