Re: [PATCH] Identify LWLocks in tracepoints - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [PATCH] Identify LWLocks in tracepoints
Date
Msg-id CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
Whole thread Raw
In response to [PATCH] Identify LWLocks in tracepoints  (Craig Ringer <craig.ringer@enterprisedb.com>)
Responses Re: [PATCH] Identify LWLocks in tracepoints  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Re: [PATCH] Identify LWLocks in tracepoints  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers
On Thu, 11 Mar 2021 at 15:57, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 10.03.21 06:38, Craig Ringer wrote:
> On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     On 1/22/21 6:02 AM, Peter Eisentraut wrote:
>
>     This patch set no longer applies:
>     http://cfbot.cputube.org/patch_32_2927.log
>     <http://cfbot.cputube.org/patch_32_2927.log>.
>
>     Can we get a rebase? Also marked Waiting on Author.
>
>
> Rebased as requested.

In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
  Is there some correctness issue?  If so, we should explain that (at
least in the commit message, or as a separate patch).

If you want I can split it out, or drop that change. I thought it was sufficiently inconsequential, but you're right to check.

The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means "releaseD". It's appropriate to emit this as soon as the lock could be acquired by anything else. By deferring it until we'd processed the waitlist and woken other backends the window during which the lock was reported as "held" was longer than it truly was, and it was easy to see one backend acquire the lock while another still appeared to hold it.

It'd possibly make more sense to have a separate TRACE_POSTGRESQL_LWLOCK_RELEASING just before the `pg_atomic_sub_fetch_u32` call. But I didn't want to spam the tracepoints too hard, and there's always going to be some degree of overlap because tracing tools cannot intercept and act during the atomic swap, so they'll always see a slightly premature or slightly delayed release. This window should be as short as possible though, hence moving the tracepoint.

Side note:

The main reason I didn't want to add more tracepoints than were strictly necessary is that Pg doesn't enable the systemtap semaphores feature, so right now we do a T_NAME(lock) evaluation each time we pass a tracepoint if --enable-dtrace is compiled in, whether or not anything is tracing. This was fine on pg11 where it was just:

#define T_NAME(lock) \
        (LWLockTrancheArray[(lock)->tranche])

but since pg13 it instead expands to

        GetLWTrancheName((lock)->tranche)

where GetLWTrancheName isn't especially trivial. We'll run that function every single time we pass any of these tracepoints and then discard the result, which is ... not ideal. That applies so long as Pg is compiled with --enable-dtrace. I've been meaning to look at enabling the systemtap semaphores feature in our build so these can be wrapped in unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted to wrap this patch set up first as there are some complexities around enabling the semaphores feature.

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies