Thread: [PATCH] Identify LWLocks in tracepoints

[PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

19 December 2020, 05:00:01

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.

[1] https://www.postgresql.org/message-id/CAGRY4nz%3DSEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA%40mail.gmail.com .

Attachment

Re: [PATCH] Identify LWLocks in tracepoints

From

Masahiko Sawada

Date:

28 December 2020, 12:09:03

Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> Hi all
>
> The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual
LWLock*and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint. 
>
> This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock*
pointersbetween two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped
atdifferent addresses. 
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no
requirementthat locks in a tranche be contiguous or known individually to the lmgr. 
>
> Despite that, the patches improve the information available for LWLock analysis significantly.
>
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that
functionnever actually acquiring the lock. 
>
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between
individuallocks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between
processes.That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also
don'thave to do userspace reads for the tranche name all the time, so the trace can run with lower overhead. 
>
> Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the
lwlock__acquiredand lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds
startmarkers that can be used for timing wallclock duration of LWLock acquires/releases. 
>
> Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand
wheninvestigating this topic. 
>

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1].  If this was intentional, then you
need to take no action.  However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

07 January 2021, 06:16:38

On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> Hi all
>
> The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock analysis significantly.
>
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.
>
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.
>
> Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.
>
> Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.
>

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1]. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

Thanks.

CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't think it's urgent and will have limited review time so I didn't try to wedge it into the current CF.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

08 January 2021, 07:17:25

On Sat, 19 Dec 2020 at 13:00, Craig Ringer <craig.ringer@enterprisedb.com> wrote:

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

I've attached a systemtap script that makes use of the information exported by the enhanced LWLock tracepoints. It offers something akin to dynamic -DLWLOCK_STATS with automatic statistical aggregation and some selective LWLOCK_DEBUG output.

The script isn't super pretty. I didn't try to separate event-data collection from results output, and there's some duplication in places. But it gives you an idea what's possible when we report lock pointers and tranche IDs to tracepoints and add entry/exit tracepoints.

Key features:

* Collect statistical aggregates on lwlock hold and wait durations across all processes. Stats are grouped by lockmode (shared or exclusive) and by tranche name, as well as rollup stats across all tranches.

* Report lock wait and hold durations for each process when that process exits. Again, reported by mode and tranche name.

* For long lock waits, print the waiter pid and waiting pid, along with each process's backend type and application_name if known, the acquire mode, and the acquire function

The output is intended to be human-readable, but it'd be quite simple to convert it into raw tsv-style output suitable for ingestion into statistical postprocessing or graphing tools.

It should be fairly easy to break down the stats by acquire function if desired, so LWLockAcquire(), LWLockWaitForVar(), and LWLockAcquireOrWait() are reported separately. They're combined in the current output.

Capturing the current query string is pretty simple if needed, but I didn't think it was likely to be especially useful.

Sample output for a pg_regress run attached. Abridged version follows. Here the !!W!! lines are "waited a long time", the !!H!! lines are "held a long time". Then [pid]:MyBackendType tranche_name wait_time_us (wait_time) in wait_func (appliation_name) => [blocker_pid] (blocker_application_name) . If blocker pid wasn't identified it won't be reported - I know how to fix that and will do so soon.

!!W!! [ 93030]:3 BufferContent 12993 (0m0.012993s) in lwlock__acquire__start (pg_regress/text)
!!W!! [ 93036]:3 LockManager 14540 (0m0.014540s) in lwlock__acquire__start (pg_regress/float8) => [ 93045] (pg_regress/regproc)
!!W!! [ 93035]:3 BufferContent 12608 (0m0.012608s) in lwlock__acquire__start (pg_regress/float4) => [ 93034] (pg_regress/oid)
!!W!! [ 93036]:3 LockManager 10301 (0m0.010301s) in lwlock__acquire__start (pg_regress/float8)
!!W!! [ 93043]:3 LockManager 10356 (0m0.010356s) in lwlock__acquire__start (pg_regress/pg_lsn)
!!H!! [ 93033]:3 BufferContent 20579 (0m0.020579s) (pg_regress/int8)
!!W!! [ 93027]:3 BufferContent 10766 (0m0.010766s) in lwlock__acquire__start (pg_regress/char) => [ 93037] (pg_regress/bit)
!!W!! [ 93036]:3 OidGen 12876 (0m0.012876s) in lwlock__acquire__start (pg_regress/float8)

...

Then the summary rollup at the end of the run. This can also be output periodically during the run. Abbreviated for highlights:

wait locks: all procs tranche mode count total avg variance min max
W LW_EXCLUSIVE (all) E 54185 14062734 259 1850265 1 44177
W LW_SHARED (all) S 3668 1116022 304 1527261 2 18642

held locks: all procs tranche mode count total avg variance min max
H LW_EXCLUSIVE (all) E 10438060 153077259 14 37035 1 195043
H LW_SHARED (all) S 14199902 65466934 4 5318 1 44030

all procs by tranche tranche mode count total avg variance min max
W tranche (all) S 3668 1116022 304 1527261 2 18642
W tranche (all) E 54185 14062734 259 1850265 1 44177
W tranche WALInsert E 9839 2393229 243 1180294 2 14209
W tranche BufferContent E 3012 1726543 573 3869409 2 28186
W tranche BufferContent S 1664 657855 395 2185694 2 18642
W tranche LockFastPath E 28314 6327801 223 1278053 1 26133
W tranche LockFastPath S 87 59401 682 3703217 19 9454
W tranche LockManager E 7223 2764392 382 2514863 2 44177

Hope this is interesting to someone.

Attachment

Re: [PATCH] Identify LWLocks in tracepoints

From

Dmitry Dolgov

Date:

13 January 2021, 11:21:34

> On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock
> analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

    /* Re-initialization of individual LWLocks is not permitted */
    Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
> it possible to differentiate between individual locks within a tranche, and
> (so long as they aren't tranches in a DSM segment) compare locks between
> processes. That means you can do lock-order analysis etc, which was not
> previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

> Traces also don't have to do userspace reads for the tranche name all
> the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

14 January 2021, 07:56:47

On 2020-12-19 06:00, Craig Ringer wrote:
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be 
> fired from LWLockWaitForVar, despite that function never actually 
> acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first 
introduced.  It looks like a mistake to me too, but maybe Heikki wants 
to comment.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

14 January 2021, 08:38:59

On Wed, 13 Jan 2021 at 19:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock
> analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

/* Re-initialization of individual LWLocks is not permitted */
Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

There should be no reason for anything to call LWLockInitialize(...) on an individual LWLock, since they are all initialized during postmaster startup.

Doing so must be a bug.

But that's a trivial change that can be done separately.

> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
> it possible to differentiate between individual locks within a tranche, and
> (so long as they aren't tranches in a DSM segment) compare locks between
> processes. That means you can do lock-order analysis etc, which was not
> previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

If code-path 1 does

LWLockAcquire(LockA, LW_EXCLUSIVE);

...

LWLockAcquire(LockB, LW_EXCLUSIVE);

and code-path 2 does:

LWLockAcquire(LockB, LW_EXCLUSIVE);

...

LWLockAcquire(LockA, LW_EXCLUSIVE);

then they're subject to deadlock. But you might not actually hit that often in test workloads if the timing required for the deadlock to occur is tight and/or occurs on infrequent operations.

It's not always easy to reason about or prove things about lock order when they're potentially nested deep within many layers of other calls and callbacks. Obviously something we try to avoid with LWLocks, but not impossible.

If you trace a workload and derive all possible nestings of lock acquire order, you can then prove things about whether there are any possible ordering conflicts and where they might arise.

A PoC to do so is on my TODO.

> Traces also don't have to do userspace reads for the tranche name all
> the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

I don't have numbers on that. Whether it matters will depend way too much on how you're using the probe points and collecting/consuming the data anyway.

It's a bit unfortunate (IMO) that we make a function call for each tracepoint invocation to get the tranche names. Ideally I'd prefer to be able to omit the tranche names lookups for these probes entirely for something as hot as LWLocks. But it's a bit of a pain to look up the tranche names from an external trace tool, so instead I'm inclined to see if we can enable systemtap's semaphores and only compute the tranche name if the target probe is actually enabled. But that'd be separate to this patch and require a build change in how systemtap support is compiled and linked.

BTW, a user->kernel->user context switch only occurs when the trace tool's probes use kernel space - such as for perf based probes, or for systemtap's kernel-runtime probes. The same markers can be used by e.g. systemtap's "dyninst" runtime that runs entirely in userspace.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

14 January 2021, 08:39:12

On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 2020-12-19 06:00, Craig Ringer wrote:
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
> fired from LWLockWaitForVar, despite that function never actually
> acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.

I'm certain it's a copy/paste bug.

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

22 January 2021, 11:02:11

On 2021-01-14 09:39, Craig Ringer wrote:
> On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
> 
>     On 2020-12-19 06:00, Craig Ringer wrote:
>      > Patch 1 fixes a bogus tracepoint where an lwlock__acquire event
>     would be
>      > fired from LWLockWaitForVar, despite that function never actually
>      > acquiring the lock.
> 
>     This was added in 68a2e52bbaf when LWLockWaitForVar() was first
>     introduced.  It looks like a mistake to me too, but maybe Heikki wants
>     to comment.
> 
> 
> I'm certain it's a copy/paste bug.

I have committed that patch.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

18 March 2021, 06:34:51

On Thu, 11 Mar 2021 at 15:57, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 10.03.21 06:38, Craig Ringer wrote:
> On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> On 1/22/21 6:02 AM, Peter Eisentraut wrote:
>
> This patch set no longer applies:
> http://cfbot.cputube.org/patch_32_2927.log
> <http://cfbot.cputube.org/patch_32_2927.log>.
>
> Can we get a rebase? Also marked Waiting on Author.
>
>
> Rebased as requested.

In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).

If you want I can split it out, or drop that change. I thought it was sufficiently inconsequential, but you're right to check.

The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means "releaseD". It's appropriate to emit this as soon as the lock could be acquired by anything else. By deferring it until we'd processed the waitlist and woken other backends the window during which the lock was reported as "held" was longer than it truly was, and it was easy to see one backend acquire the lock while another still appeared to hold it.

It'd possibly make more sense to have a separate TRACE_POSTGRESQL_LWLOCK_RELEASING just before the `pg_atomic_sub_fetch_u32` call. But I didn't want to spam the tracepoints too hard, and there's always going to be some degree of overlap because tracing tools cannot intercept and act during the atomic swap, so they'll always see a slightly premature or slightly delayed release. This window should be as short as possible though, hence moving the tracepoint.

Side note:

The main reason I didn't want to add more tracepoints than were strictly necessary is that Pg doesn't enable the systemtap semaphores feature, so right now we do a T_NAME(lock) evaluation each time we pass a tracepoint if --enable-dtrace is compiled in, whether or not anything is tracing. This was fine on pg11 where it was just:

#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])

but since pg13 it instead expands to

GetLWTrancheName((lock)->tranche)

where GetLWTrancheName isn't especially trivial. We'll run that function every single time we pass any of these tracepoints and then discard the result, which is ... not ideal. That applies so long as Pg is compiled with --enable-dtrace. I've been meaning to look at enabling the systemtap semaphores feature in our build so these can be wrapped in unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted to wrap this patch set up first as there are some complexities around enabling the semaphores feature.

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

19 March 2021, 20:06:40

On 18.03.21 07:34, Craig Ringer wrote:
>     In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
>     moved?
>        Is there some correctness issue?  If so, we should explain that (at
>     least in the commit message, or as a separate patch).
> 
> 
> If you want I can split it out, or drop that change. I thought it was 
> sufficiently inconsequential, but you're right to check.
> 
> The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means 
> "releaseD". It's appropriate to emit this as soon as the lock could be 
> acquired by anything else. By deferring it until we'd processed the 
> waitlist and woken other backends the window during which the lock was 
> reported as "held" was longer than it truly was, and it was easy to see 
> one backend acquire the lock while another still appeared to hold it.

 From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE 
probe was in the right place until PG 9.4, but was then moved by 
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so 
it seems the move might have been accidental.  The documentation 
specifically states that the probe is triggered before waiters are woken 
up, which it specifically does not do at the moment.  So this looks like 
a straight bug fix to me.

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

19 March 2021, 20:21:24

On 18.03.21 07:34, Craig Ringer wrote:
> The main reason I didn't want to add more tracepoints than were strictly 
> necessary is that Pg doesn't enable the systemtap semaphores feature, so 
> right now we do a T_NAME(lock) evaluation each time we pass a tracepoint 
> if --enable-dtrace is compiled in, whether or not anything is tracing. 
> This was fine on pg11 where it was just:
> 
> #define T_NAME(lock) \
>          (LWLockTrancheArray[(lock)->tranche])
> 
> but since pg13 it instead expands to
> 
>          GetLWTrancheName((lock)->tranche)
> 
> where GetLWTrancheName isn't especially trivial. We'll run that function 
> every single time we pass any of these tracepoints and then discard the 
> result, which is ... not ideal. That applies so long as Pg is compiled 
> with --enable-dtrace. I've been meaning to look at enabling the 
> systemtap semaphores feature in our build so these can be wrapped in 
> unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted 
> to wrap this patch set up first as there are some complexities around 
> enabling the semaphores feature.

There is already support for that.  See the documentation at the end of 
this page: 
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

20 March 2021, 00:29:32

On Sat, 20 Mar 2021, 04:21 Peter Eisentraut, <peter.eisentraut@enterprisedb.com> wrote:

On 18.03.21 07:34, Craig Ringer wrote:
> The main reason I didn't want to add more tracepoints than were strictly
> necessary is that Pg doesn't enable the systemtap semaphores feature, so
> right now we do a T_NAME(lock) evaluation each time we pass a tracepoint
> if --enable-dtrace is compiled in, whether or not anything is tracing.
> This was fine on pg11 where it was just:
>
> #define T_NAME(lock) \
> (LWLockTrancheArray[(lock)->tranche])
>
> but since pg13 it instead expands to
>
> GetLWTrancheName((lock)->tranche)
>
> where GetLWTrancheName isn't especially trivial. We'll run that function
> every single time we pass any of these tracepoints and then discard the
> result, which is ... not ideal. That applies so long as Pg is compiled
> with --enable-dtrace. I've been meaning to look at enabling the
> systemtap semaphores feature in our build so these can be wrapped in
> unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted
> to wrap this patch set up first as there are some complexities around
> enabling the semaphores feature.

There is already support for that. See the documentation at the end of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

Pretty sure it won't work right now.

To use systemtap semaphores (the _ENABLED macros) you need to run dtrace -g to generate a probes.o then link that into postgres.

I don't think we do that. I'll double check soon.

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

21 March 2021, 07:13:15

On 19.03.21 21:06, Peter Eisentraut wrote:
> On 18.03.21 07:34, Craig Ringer wrote:
>>     In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
>>     moved?
>>        Is there some correctness issue?  If so, we should explain that 
>> (at
>>     least in the commit message, or as a separate patch).
>>
>>
>> If you want I can split it out, or drop that change. I thought it was 
>> sufficiently inconsequential, but you're right to check.
>>
>> The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means 
>> "releaseD". It's appropriate to emit this as soon as the lock could be 
>> acquired by anything else. By deferring it until we'd processed the 
>> waitlist and woken other backends the window during which the lock was 
>> reported as "held" was longer than it truly was, and it was easy to 
>> see one backend acquire the lock while another still appeared to hold it.
> 
>  From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE 
> probe was in the right place until PG 9.4, but was then moved by 
> ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so 
> it seems the move might have been accidental.  The documentation 
> specifically states that the probe is triggered before waiters are woken 
> up, which it specifically does not do at the moment.  So this looks like 
> a straight bug fix to me.

committed a fix for that

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

22 March 2021, 08:38:15

On 10.03.21 06:38, Craig Ringer wrote:
> On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net 
> <mailto:david@pgmasters.net>> wrote:
> 
>     On 1/22/21 6:02 AM, Peter Eisentraut wrote:
> 
>     This patch set no longer applies:
>     http://cfbot.cputube.org/patch_32_2927.log
>     <http://cfbot.cputube.org/patch_32_2927.log>.
> 
>     Can we get a rebase? Also marked Waiting on Author.
> 
> 
> Rebased as requested.
> 
> I'm still interested in whether Andres will be able to do anything about 
> identifying LWLocks in a cross-backend manner. But this work doesn't 
> really depend on that; it'd benefit from it, but would be easily adapted 
> to it later if needed.

First, a problem:  0002 doesn't build on macOS, because uint64 has been 
used in the probe definitions.  That needs to be handled like the other 
nonnative types in that file.

All the probe changes and additions should be accompanied by 
documentation changes.

The probes used to have an argument to identify the lock, which was 
removed by 3761fe3c20bb040b15f0e8da58d824631da00caa.  The 0001 patch is 
essentially trying to reinstate that, which seems sensible.  Perhaps we 
should also use the argument order that used to be there.  It used to be

probe lwlock__acquire(const char *, int, LWLockMode);

and now it would be

probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);

Also, do we need both the tranche name and the tranche id?  Or maybe we 
don't need the name, or can record it differently, which might also 
address your other concern that it's too expensive to compute.  In any 
case, I think an argument order like

probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);

would make more sense.

In 0004, you add a probe to record the application_name setting?  Would 
there be any value in making that a generic probe that can record any 
GUC change?

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

22 March 2021, 09:00:39

On 20.03.21 01:29, Craig Ringer wrote:
>     There is already support for that.  See the documentation at the end of
>     this page:
>     https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
>     <https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS>
> 
> 
> Pretty sure it won't work right now.
> 
> To use systemtap semaphores (the _ENABLED macros) you need to run dtrace 
> -g to generate a probes.o then link that into postgres.
> 
> I don't think we do that. I'll double check soon.

We do that.  (It's -G.)

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

12 April 2021, 05:46:30

On Mon, 22 Mar 2021 at 17:00, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 20.03.21 01:29, Craig Ringer wrote:
> There is already support for that. See the documentation at the end of
> this page:
> https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
> <https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS>
>
>
> Pretty sure it won't work right now.
>
> To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
> -g to generate a probes.o then link that into postgres.
>
> I don't think we do that. I'll double check soon.

We do that. (It's -G.)

Huh. I could've sworn we didn't. My mistake, it's there in src/backend/Makefile .

In that case I'll amend the patch to use semaphore guards.

(On a side note, systemtap's semaphore support is actually a massive pain. The way it's implemented in <sys/sdt.h> means that a single compilation unit may not use both probes.d style markers produced by the dtrace script and use regular DTRACE_PROBE(providername,probename) preprocessor macros. If it attempts to do so, the DTRACE_PROBE macros will emit inline asm that tries to reference probename_semaphore symbols that will not exist, resulting in linker errors or runtime link errors. But that's really a systemtap problem. Core PostgreSQL doesn't use any explicit DTRACE_PROBE(...), STAP_PROBE(...) etc.)

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

12 April 2021, 06:31:32

On Mon, 22 Mar 2021 at 16:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

First, a problem: 0002 doesn't build on macOS, because uint64 has been
used in the probe definitions. That needs to be handled like the other
nonnative types in that file.

Will fix.

All the probe changes and additions should be accompanied by
documentation changes.

Agreed, will fix.

The probes used to have an argument to identify the lock, which was
removed by 3761fe3c20bb040b15f0e8da58d824631da00caa.

Huh. That's exactly the functionality I was looking for. Damn. I understand why Robert removed it, but its removal makes it much harder to identify an LWLock since it might fall in a DSM segment that could be mapped at different base addresses in different backends.

Robert's patch didn't replace the offset within tranche with anything else to identify the lock. A LWLock* is imperfect due to ASLR and DSM but it's better than nothing. In theory we could even remap them in trace tools if we had tracepoints on DSM attach and detach that showed their size and base address too.

CC'ing Andres, as he expressed interest in being able to globally identify LWLocks too.

The 0001 patch is
essentially trying to reinstate that, which seems sensible. Perhaps we
should also use the argument order that used to be there. It used to be

probe lwlock__acquire(const char *, int, LWLockMode);

and now it would be

probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);

Also, do we need both the tranche name and the tranche id?

Reasons to have the name:

* There is no easy way to look up the tranche name by ID from outside the backend

* A tranche ID by itself is pretty much meaningless especially for dynamic tranches

* Any existing scripts will rely on the tranche name

So the tranche name is really required to generate useful output for any dynamic tranches, or simple and readable output from things like perf.

Reasons to have the tranche ID:

* The tranche name is not guaranteed to have the same address for a given value across backends in the presence of ASLR, even for built-in tranches. So tools need to read tranche names as user-space strings, which is much more expensive than consuming an int argument from the trace args. Storing and reporting maps of events by tranche name (string) in tools is also more expensive than having a tranche id.

* When the trace tool or script wants to filter for only one particular tranche,particularly when it's a built-in tranche where the tranche ID is known, having the ID is much more useful and efficient.

* If we can avoid computing the tranche name, emitting just the tranche ID would be much faster.

It's annoying that we have to pay the cost of computing the tranche name though. It never used to matter, but now that T_NAME() expands to GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more on such a hot path. I might see if I can do a little comparison and see how much.

I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we do in fact build with SDT semaphore support. That adds a branch for each tracepoint, but they're already marshalling arguments and making a function call that does lots more than a single branch, so that seems pretty sensible. The main downside of using _ENABLED() USDT semaphore guards is that not all tools are guaranteed to understand or support them. So an older perf, for example, might simply fail to fire events on guarded probes. That seems OK to me, the onus should be on the probe tool to pay any costs, not on PostgreSQL. Despite that I don't want to mark the _ENABLED() guards unlikely(), since that'd increase the observer effect where probing LWLocks changes their timing and behaviour. Branch prediction should do a very good job in such cases without being forced.

I wonder a little about the possible cache costs of the _ENABLED() macros though. Their data is in a separate ELF segment and separate .o, with no locality to the traced code. It might be worth checking that before proceeding; I guess it's even possible that the GetLWTrancheName() calls could be cheaper. Will see if I can run some checks and report back.

BTW, if you want some of the details on how userspace SDTs work, https://leezhenghui.github.io/linux/2019/03/05/exploring-usdt-on-linux.html is interesting and useful. It helps explain uprobes, ftrace, bcc, etc.

Or maybe we
don't need the name, or can record it differently, which might also
address your other concern that it's too expensive to compute. In any
case, I think an argument order like

probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);

would make more sense.

OK.

In 0004, you add a probe to record the application_name setting? Would
there be any value in making that a generic probe that can record any
GUC change?

Yes, there would, but I didn't want to go and do that in the same patch, and a named probe on application_name is useful separately to having probes on any GUC.

There's value in having a probe with an easily targeted name that probes the application_name since it's of obvious interest and utility to probing and tracing tools. A probe specifically on application_name means a probing script doesn't have to fire an event for every GUC, copy the GUC name string, strcmp() it to see if it's the GUC of interest, etc. So specific probes on "major" GUCs like this are IMO very useful.

(It'd be possible to instead generate probes for each GUC at compile-time using the preprocessor and the DTRACE_ macros. But as noted above, that doesn't currently work properly in the same compilation unit that a dtrace script-generated probes.h is included in. I think it's probably nicer to have specific probes for GUCs of high interest, then generic probes that capture all GUCs anyway.)

There are a TON of probes I want to add, and I have a tree full of them waiting to submit progressively. Yes, ability to probe all GUCs is in there. So is detail on walsender, reorder buffer, and snapshot builder activity. Heavyweight lock SDTs. A probe that identifies the backend type at startup. SDT probe events emitted for every wait-event. Probes in elog.c to let probes observe error unwinding, capture error messages, etc. (Those can also be used with systemtap guru mode scripts to do things like turn a particular elog(DEBUG) into a PANIC at runtime for diagnostic purposes). Probes in shm_mq to observe message passing and blocking. A probe that fires whenever debug_query_string changes. Lots. But I can't submit them all at once, especially without some supporting use cases and scripts that other people can use so they can understand why these probes are useful.

So I figured I'd start here...

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

12 April 2021, 18:23:15

Hi,

On 2021-04-12 14:31:32 +0800, Craig Ringer wrote:
> * There is no easy way to look up the tranche name by ID from outside the
> backend

But it's near trivial to add that.


> It's annoying that we have to pay the cost of computing the tranche name
> though. It never used to matter, but now that T_NAME() expands to
> GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
> on such a hot path. I might see if I can do a little comparison and see how
> much.  I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
> do in fact build with SDT semaphore support. That adds a branch for each
> tracepoint, but they're already marshalling arguments and making a function
> call that does lots more than a single branch, so that seems pretty
> sensible.

I am against adding any overhead for this feature. I honestly think the
probes we have right now in postgres do not provide a commensurate
benefit.


> (It'd be possible to instead generate probes for each GUC at compile-time
> using the preprocessor and the DTRACE_ macros. But as noted above, that
> doesn't currently work properly in the same compilation unit that a dtrace
> script-generated probes.h is included in. I think it's probably nicer to
> have specific probes for GUCs of high interest, then generic probes that
> capture all GUCs anyway.)
>
> There are a TON of probes I want to add, and I have a tree full of them
> waiting to submit progressively. Yes, ability to probe all GUCs is in
> there. So is detail on walsender, reorder buffer, and snapshot builder
> activity. Heavyweight lock SDTs. A probe that identifies the backend type
> at startup. SDT probe events emitted for every wait-event. Probes in elog.c
> to let probes observe error unwinding, capture error messages,
> etc. [...] A probe that fires whenever debug_query_string
> changes. Lots. But I can't submit them all at once, especially without
> some supporting use cases and scripts that other people can use so
> they can understand why these probes are useful.

-1. This is not scalable. Adding static probes all over has both a
runtime (L1I, branches, code optimization) and maintenance overhead.


> (Those can also be used with systemtap guru mode scripts to do things
> like turn a particular elog(DEBUG) into a PANIC at runtime for
> diagnostic purposes).

Yikes.

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

13 April 2021, 02:34:18

On Tue, 13 Apr 2021 at 02:23, Andres Freund <andres@anarazel.de> wrote:

[I've changed the order of the quoted sections a little to prioritize
the key stuff]

>
> On 2021-04-12 14:31:32 +0800, Craig Ringer wrote:
>
> > It's annoying that we have to pay the cost of computing the tranche name
> > though. It never used to matter, but now that T_NAME() expands to
> > GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
> > on such a hot path. I might see if I can do a little comparison and see how
> > much.  I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
> > do in fact build with SDT semaphore support. That adds a branch for each
> > tracepoint, but they're already marshalling arguments and making a function
> > call that does lots more than a single branch, so that seems pretty
> > sensible.
>
> I am against adding any overhead for this feature. I honestly think the
> probes we have right now in postgres do not provide a commensurate
> benefit.

I agree that the probes we have now are nearly useless, if not
entirely useless. The transaction management ones are misplaced and
utterly worthless. The LWLock ones don't carry enough info to be much
use and are incomplete. I doubt anybody uses any of them at all, or
would even notice their absence.

In terms of overhead, what is in place right now is not free. It used
to be very cheap, but since 29c3e2dd5a6 it's not. I'd like to reduce
the current cost and improve functionality at the same time, so it's
actually useful.

> > * There is no easy way to look up the tranche name by ID from outside the
> > backend
>
> But it's near trivial to add that.

Really?

We can expose a pg_catalog.lwlock_tranches view that lets you observe
the current mappings for any given user backend I guess.

But if I'm looking for performance issues caused by excessive LWLock
contention or waits, LWLocks held too long, LWLock lock-ordering
deadlocks, or the like, it's something I want to capture across the
whole postgres instance. Each backend can have different tranche IDs
(right?) and there's no way to know what a given non-built-in tranche
ID means for any given backend without accessing backend-specific
in-memory state. Including for non-user-accessible backends like
bgworkers and auxprocs, where it's not possible to just query the
state from a view directly.

So we'd be looking at some kind of shm based monstrosity. That doesn't
sound appealing. Worse, there's no way to solve races with it - is a
given tranche ID already allocated when you see it? If not, can you
look it up from the backend before the backend exits/dies? For that
matter, how do you do that, since the connection to the backend is
likely under the control of an application, not your monitoring and
diagnostic tooling.

Some trace tools can poke backend memory directly, but it generally
requires debuginfo, is fragile and Pg version specific, slow, and a
real pain to use. If we don't attach the LWLock names to the
tracepoints in some way they're pretty worthless.

Again, I don't plan to add new costs here. I'm actually proposing to
reduce an existing cost.

And you can always build without `--enable-dtrace` and ... just not care.

Anyway - I'll do some `perf` runs shortly to quantify this:

* With/without tracepoints at all
* With/without names in tracepoints
* With/without tracepoint refcounting (_ENABLED() semaphores)

so as to rely less on handwaving.

> > (Those can also be used with systemtap guru mode scripts to do things
> > like turn a particular elog(DEBUG) into a PANIC at runtime for
> > diagnostic purposes).
>
> Yikes.
>

Well, it's not like it can happen by accident. You have to
deliberately write a script that twiddles process memory, using a tool
that requires special privileges and

I recently had to prepare a custom build for a customer that converted
an elog(DEBUG) into an elog(PANIC) in order to capture a core with
much better diagnostic info for a complex, hard to reproduce and
intermittent memory management issue. It would've been rather nice to
be able to do so with a trace marker instead of a custom build.

> > There are a TON of probes I want to add, and I have a tree full of them
> > waiting to submit progressively. Yes, ability to probe all GUCs is in
> > there. So is detail on walsender, reorder buffer, and snapshot builder
> > activity. Heavyweight lock SDTs. A probe that identifies the backend type
> > at startup. SDT probe events emitted for every wait-event. Probes in elog.c
> > to let probes observe error unwinding, capture error messages,
> > etc. [...] A probe that fires whenever debug_query_string
> > changes. Lots. But I can't submit them all at once, especially without
> > some supporting use cases and scripts that other people can use so
> > they can understand why these probes are useful.
>
> -1. This is not scalable. Adding static probes all over has both a
> runtime (L1I, branches, code optimization) and maintenance overhead.

Take a look at "sudo perf list".

  sched:sched_kthread_work_execute_end               [Tracepoint event]
  sched:sched_kthread_work_execute_start             [Tracepoint event]
  ...
  sched:sched_migrate_task                           [Tracepoint event]
  ...
  sched:sched_process_exec                           [Tracepoint event]
  ...
  sched:sched_process_fork                           [Tracepoint event]
  ...
  sched:sched_stat_iowait                            [Tracepoint event]
  ...
  sched:sched_stat_sleep                             [Tracepoint event]
  sched:sched_stat_wait                              [Tracepoint event]
  ...
  sched:sched_switch                                 [Tracepoint event]
  ...
  sched:sched_wakeup                                 [Tracepoint event]

The kernel is packed with extremely useful trace events, and for very
good reasons. Some on very hot paths.

I do _not_ want to randomly add probes everywhere. I propose that they be added:

* Where they will meaningfully aid production diagnosis, complex
testing, and/or development activity. Expose high level activity of
key subsystems via trace markers especially at the boundaries of IPCs
or logic otherwise passes between processes.
* Where it's not feasible to instead adjust code structure to make
DWARF debuginfo based probing sufficient.
* Where there's no other sensible way to get useful information
without excessive complexity and/or runtime cost, but it could be very
important for understanding intermittent production issues or
performance problems at scale in live systems.
* Where the execution path is not extremely hot - e.g. no static
tracepoints in spinlocks or atomics.
* Where a DWARF debuginfo based probe cannot easily replace them, i.e.
generally not placed on entry and exit of stable and well-known
functions.

Re the code structure point above, we have lots of places where we
return in multiple places, or where a single function can do many
different things with different effects on system state. For example
right now it's quite complex to place probes to definitively confirm
the outcome of a given transaction and capture its commit record lsn.
Functions with many branches that each fiddle with system state,
functions that test for the validity of some global and short-circuit
return if invalid, etc. Functions that do long loops over big chunks
of logic are hard too, e.g. ReorderBufferCommit.

I want to place probes where they will greatly simplify observation of
important global system state that's not easily observed using
traditional tools like gdb or logging.

When applied sensibly and moderately, trace markers are absolutely
amazing for diagnostic and performance work. You can attach to them in
production builds even without debuginfo and observe behaviour that
would otherwise be impossible without complex fiddling around with
multi-process gdb. This sort of capability is going to become more and
more important as we become more parallel and can rely less on
single-process gdb-style tracing. Diagnostics using logging is a blunt
hammer that does not scale and is rarely viable for intermittent or
hard to reproduce production issues.

I will always favour "native postgres" solutions where feasible - for
example, I want to add some basic reorder buffer state to struct
WalSnd and the pg_stat_replication views, and I want to expose some
means to get a walsender to report details of its ReorderBuffer state.

But some things are not very amenable to that. Either the runtime
costs of having the facility available are too high (we're never going
to have a pg_catalog.pg_lwlocks for good reasons) or it's too
complicated to write and maintain. Especially where info is needed
from many processes.

That's where trace markers become valuable. But right now what we have
in Pg is worthless, and it seems almost nobody knows how to use the
tools. I want to change that, but it's a bit of a catch-22. Making
tooling easy to use benefits enormously from some more stable
interfaces that don't break so much version-to-version, don't require
deep code knowledge to understand, and work without debuginfo on
production builds. But without some "oh, wow" tools, it's hard to
convince anyone we should invest any effort in improving the
infrastructure...

It's possible I'm beating a dead horse here. I find these tools
amazingly useful, but they're currently made 10x harder than they need
to be by the complexities of directly poking at postgres's complex and
version-specific internal structure using debuginfo based probing.
Things that should be simple, like determining the timings of a txn
from xid assignment -> 2pc prepare -> 2pc commit prepared .... really
aren't. Markers that report xid assignment, commit, rollback, etc,
with the associated topxid would help immensely.

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

13 April 2021, 03:06:51

Hi,

On 2021-04-13 10:34:18 +0800, Craig Ringer wrote:
> > But it's near trivial to add that.
> 
> Really?

Yes.

> Each backend can have different tranche IDs (right?)

No, they have to be the same in each. Note how the tranche ID is part of
struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
etc.

> But if I'm looking for performance issues caused by excessive LWLock
> contention or waits, LWLocks held too long, [...] or the like, it's
> something I want to capture across the whole postgres instance.

Sure.

Although I still don't really buy that static tracepoints are the best
way to measure this kind of thing, given the delay introducing them and
the cost of having them around. I think I pointed out
https://postgr.es/m/20200813004233.hdsdfvufqrbdwzgr%40alap3.anarazel.de
before.

> LWLock lock-ordering deadlocks

This seems unrelated to tracepoints to me.

> and there's no way to know what a given non-built-in tranche ID means
> for any given backend without accessing backend-specific in-memory
> state. Including for non-user-accessible backends like bgworkers and
> auxprocs, where it's not possible to just query the state from a view
> directly.

The only per-backend part is that some backends might not know the
tranche name for dynamically registered tranches where the
LWLockRegisterTranche() hasn't been executed in a backend. Which should
pretty much never be an aux process or such. And even for bgworkers it
seems like a pretty rare thing, because those need to be started by
something...

It might be worth proposing a shared hashtable with tranch names and
jut reserving enough space for ~hundred entries...

> And you can always build without `--enable-dtrace` and ... just not care.

Practically speaking, distributions enable it, which then incurs the
cost for everyone.

> Take a look at "sudo perf list".
> 
> 
>   sched:sched_kthread_work_execute_end               [Tracepoint event]
>   sched:sched_kthread_work_execute_start             [Tracepoint event]
>   ...
>   sched:sched_migrate_task                           [Tracepoint event]
>   ...
>   sched:sched_process_exec                           [Tracepoint event]
>   ...
>   sched:sched_process_fork                           [Tracepoint event]
>   ...
>   sched:sched_stat_iowait                            [Tracepoint event]
>   ...
>   sched:sched_stat_sleep                             [Tracepoint event]
>   sched:sched_stat_wait                              [Tracepoint event]
>   ...
>   sched:sched_switch                                 [Tracepoint event]
>   ...
>   sched:sched_wakeup                                 [Tracepoint event]
> 
> The kernel is packed with extremely useful trace events, and for very
> good reasons. Some on very hot paths.

IIRC those aren't really comparable - the kernel actually does modify
the executable code to replace the tracepoints with nops.

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

13 April 2021, 13:05:18

On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote:

> > Each backend can have different tranche IDs (right?)
>
> No, they have to be the same in each. Note how the tranche ID is part of
> struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
> etc.

Ah. I misunderstood that at some point.

That makes it potentially more sensible to skip reporting tranche
names. Not great, because it's much less convenient to work with trace
data full of internal ordinals that must be re-mapped in
post-processing. But I'm generally OK with deferring runtime costs to
tooling rather than the db itself so long as doing so is moderately
practical.

In this case, I think we could likely get away with removing the
tranche names from the tracepoints if we instead emit a trace event on
each dynamic tranche registration that reports the tranche id -> name
mapping. It still sucks for tools, since they have to scrape up the
static tranche registrations from somewhere else, but ... it'd be
tolerable.

> > The kernel is packed with extremely useful trace events, and for very
> > good reasons. Some on very hot paths.
>
> IIRC those aren't really comparable - the kernel actually does modify
> the executable code to replace the tracepoints with nops.

Same with userspace static trace markers (USDTs).

A followup mail will contain a testcase and samples to demonstrate this.

> Although I still don't really buy that static tracepoints are the best
> way to measure this kind of thing, given the delay introducing them and
> the cost of having them around. I think I pointed out
> https://postgr.es/m/20200813004233.hdsdfvufqrbdwzgr%40alap3.anarazel.de
> before.

Yeah. Semaphores are something hot enough that I'd hesitate to touch them.

> > LWLock lock-ordering deadlocks
>
> This seems unrelated to tracepoints to me.

If I can observe which locks are acquired in which order by each proc,
I can then detect excessive waits and report the stack of held locks
of both procs and their order of acquisition.

Since LWLocks shmem state doesn't AFAICS track any information on the
lock holder(s) I don't see a way to do this in-process.

It's not vital, it's just one of the use cases I have in mind. I
suspect that any case where such deadlocks are possible represents a
misuse of LWLocks anyway.

> > and there's no way to know what a given non-built-in tranche ID means
> > for any given backend without accessing backend-specific in-memory
> > state. Including for non-user-accessible backends like bgworkers and
> > auxprocs, where it's not possible to just query the state from a view
> > directly.
>
> The only per-backend part is that some backends might not know the
> tranche name for dynamically registered tranches where the
> LWLockRegisterTranche() hasn't been executed in a backend. Which should
> pretty much never be an aux process or such. And even for bgworkers it
> seems like a pretty rare thing, because those need to be started by
> something...
>
> It might be worth proposing a shared hashtable with tranch names and
> jut reserving enough space for ~hundred entries...

Yeah, that'd probably work and be cheap enough not to really matter.
Might even save us a chunk of memory by not turning CoW pages into
private mappings for each backend during registration.

> > And you can always build without `--enable-dtrace` and ... just not care.
>
> Practically speaking, distributions enable it, which then incurs the
> cost for everyone.

Yep. That's part of why I was so surprised to notice the
GetLWTrancheName() function call in LWLock tracepoints. Nearly
anywhere else it wouldn't matter at all, but LWLocks are hot enough
that it just might matter for the no-wait fastpath.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

13 April 2021, 13:40:58

On Tue, 13 Apr 2021 at 21:05, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote:
> > IIRC those aren't really comparable - the kernel actually does modify
> > the executable code to replace the tracepoints with nops.
>
> Same with userspace static trace markers (USDTs).
>
> A followup mail will contain a testcase and samples to demonstrate this.

Demo follows, with source attached too. gcc 10.2 compiling with -O2,
using dtrace and <sys/sdt.h> from systemtap 4.4 .

Trivial empty function definition:

    __attribute__((noinline))
    void
    no_args(void)
    {
                SDT_NOOP_NO_ARGS();
    }

Disassembly when SDT_NOOP_NO_ARGS is defined as

    #define SDT_NOOP_NO_ARGS()

is:

    <no_args>:
        retq

When built with a probes.d definition processed by the dtrace script
instead, the disassembly becomes:

    <no_args>:
        nop
        retq

So ... yup, it's a nop.

Now, if we introduce semaphores that changes.

    __attribute__((noinline))
    void
    no_args(void)
    {
        if (SDT_NOOP_NO_ARGS_ENABLED())
                SDT_NOOP_NO_ARGS();
    }

disassembles to:

    <no_args>:
        cmpw   $0x0,0x2ec4(%rip)        # <sdt_noop_no_args_semaphore>
        jne    <no_args+0x10>
        retq
        nopl   0x0(%rax,%rax,1)
        nop
        retq

so the semaphore test is actually quite harmful and wasteful in this
case. That's not surprising since this SDT is a simple marker point.
But what if we supply arguments to it? It turns out that the
disassembly is the same if args are passed, whether locals or globals,
including globals assigned based on program input that can't be
determined at compile time. Still just a nop.

If I pass a function call as an argument expression to a probe, e.g.

    __attribute__((noinline)) static int
    compute_probe_argument(void)
    {
        return 100;
    }

    void
    with_computed_arg(void)
    {
        SDT_NOOP_WITH_COMPUTED_ARG(compute_probe_argument());
    }

then the disassembly with SDTs is:

    <with_computed_arg>:
        callq  <compute_probe_argument>
        nop
        retq

so the function call isn't elided even if it's unused. That's somewhat
expected. The same will be true if the arguments to a probe require
pointer chasing or non-trivial marshalling.

If a semaphore guard is added this becomes:

    <with_computed_arg>:
        cmpw   $0x0,0x2e2e(%rip)        # <sdt_noop_with_computed_arg_semaphore>
        jne    <with_computed_arg+0x10>
        retq
        nopl   0x0(%rax,%rax,1)
        callq  <compute_probe_argument>
        nop
        retq

so now the call to compute_probe_argument() is skipped unless the
probe is enabled, but the function is longer and requires a test and
jump.

If I dummy up a function that does some pointer chasing, without
semaphores I get

<with_pointer_chasing>:
    mov    (%rdi),%rax
    mov    (%rax),%rax
    mov    (%rax),%rax
    nop
    retq

so the arguments are marshalled then ignored.

with semaphores I get:

<with_pointer_chasing>:
    cmpw   $0x0,0x2d90(%rip)        # <sdt_noop_with_pointer_chasing_semaphore>
    jne    <with_pointer_chasing+0x10>
    retq
    nopl   0x0(%rax,%rax,1)
    mov    (%rdi),%rax
    mov    (%rax),%rax
    mov    (%rax),%rax
    nop
    retq

so again the probe's argument marshalling is inline in the function
body, but at the end, and skipped over.

Findings:

* A probe without arguments or with simple arguments is just a 'nop' instruction
* Probes that require function calls, pointer chasing, other
expression evaluation etc may impose a fixed cost to collect up
arguments even if the probe is disabled.
* SDT semaphores can avoid that cost but add a branch, so should
probably be avoided unless preparing probe arguments is likely to be
expensive.

Hideous but effective demo code attached.

Attachment

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

13 April 2021, 14:48:02

On Tue, 13 Apr 2021 at 21:40, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

> Findings:
>
> * A probe without arguments or with simple arguments is just a 'nop' instruction
> * Probes that require function calls, pointer chasing, other
> expression evaluation etc may impose a fixed cost to collect up
> arguments even if the probe is disabled.
> * SDT semaphores can avoid that cost but add a branch, so should
> probably be avoided unless preparing probe arguments is likely to be
> expensive.

Back to the topic directly at hand.

Attached a disassembly of what LWLockAcquire looks like now on my
current build of git master @ 5fe83adad9efd5e3929f0465b44e786dc23c7b55
. This is an --enable-debug --enable-cassert --enable-dtrace build
with -Og -ggdb3.

The three tracepoints in it are all of the form:

   movzwl 0x0(%r13),%edi
   call   0x801c49 <GetLWTrancheName>
   nop

so it's clear we're doing redundant calls to GetLWTrancheName(), as
expected. Not ideal.

Now if I patch it to add the _ENABLED() guards on all the tracepoints,
the probes look like this:

   0x0000000000803176 <+200>:   cmpw   $0x0,0x462da8(%rip)        #
0xc65f26 <postgresql_lwlock__acquire_semaphore>
   0x000000000080317e <+208>:   jne    0x80328b <LWLockAcquire+477>
   .... other interleaved code ...
   0x000000000080328b <+477>:   movzwl 0x0(%r13),%edi
   0x0000000000803290 <+482>:   call   0x801c49 <GetLWTrancheName>
   0x0000000000803295 <+487>:   nop
   0x0000000000803296 <+488>:   jmp    0x803184 <LWLockAcquire+214>

so we avoid the GetLWTrancheName() call at the cost of a test and
possible branch, and a small expansion in total function size. Without
the semaphores, LWLockAcquire is 463 bytes. With them, it's 524 bytes,
which is nothing to sneeze at for code that doesn't do anything
99.999% of the time, but we avoid a bunch of GetLWTrancheName() calls.

If I instead replace T_NAME() with NULL for all tracepoints in
LWLockAcquire, the disassembly shows that the tracepoints now become a
simple

   0x0000000000803176 <+200>:   nop

which is pretty hard to be concerned about.

So at the very least we should be calling GetLWTrancheName() once at
the start of the function if built with dtrace support and remembering
the value, instead of calling it for each tracepoint.

And omitting the tranche name looks like it might be sensible for the
LWLock code. In most places it won't matter, but LWLocks are hot
enough that it possibly might. A simple pg_regress run hits
LWLockAcquire 25 million times, so that's 75 million calls to
GetLWTrancheName().

Re: [PATCH] Identify LWLocks in tracepoints

From

Robert Haas

Date:

13 April 2021, 18:25:23

On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
> No, they have to be the same in each. Note how the tranche ID is part of
> struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
> etc.

More precisely, if a tranche ID is defined in multiple backends, it
needs to be defined the same way in all of them. But it is possible to
have an extension loaded into some backends and not others and have it
define a tranche ID that other backends know nothing about.

Another point to note is that, originally, I had an idea that each
tranche of lwlocks was situation in a single array somewhere in
memory. Perhaps that was an array of something else, like buffer
descriptors, and the lwlocks were just one element of the struct, or
maybe it was an array specifically of LWLocks, but one way or the
other, there was definitely one array that had all the LWLocks from
that tranche in it. So before the commit in question --
3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an
offset for a lock within the tranche that was supposed to uniquely
identify the lock. However, the whole idea of an array per tranche
turns out to be broken by design.

Consider parallel query. You could, perhaps, arrange for all the
LWLocks that a particular query needs to be in one tranche. And that's
all fine. But what if there are multiple parallel contexts in
existence at the same time? I think right now that may be impossible
as a practical matter, since for example an SQL function that is
called by a parallel query is supposed to run any SQL statements
inside of it without parallelism. But, that's basically a policy
decision. There's nothing in the parallel context machinery itself
which prevents multiple parallel contexts from being active at the
same time. And if that happens, then you'd have multiple arrays with
the same tranche ID, so how do you identify the locks then? The
pre-3761fe3c20bb040b15f0e8da58d824631da00caa data structure doesn't
work because it has only one place to store an array base, but having
multiple places to store an array base doesn't fix it either because
now you've just given the same identifier to multiple locks.

You could maybe fix it by putting a limit on how many parallel
contexts can be open at the same time, and then having N copies of
each parallelism-related tranche. But that seems ugly and messy and a
burden on extension authors and not really what anybody wants.

You could try to identify locks by pointer addresses, but that's got
security hazards and the addreses aren't portable across all the
backends involved in the parallel query because of how DSM works, so
it's not really that helpful in terms of matching stuff up.

You could identify every lock by a tranche ID + an array offset + a
"tranche instance ID". But where would you store the tranche instance
ID to make it readily accessible, other than in the lock itself?
Andres wasn't thrilled about using even 2 bytes to identify the
LWLock, so he'll probably like having more bytes in there for that
kind of thing even less. And to be honest I wouldn't blame him. We
only need 12 bytes to implement the lock itself -- we can't justify
having more than a couple of additional bytes for debugging purposes.

On a broader level, I agree that being able to find out what the
system is doing is really important. But I'm also not entirely
convinced that having really fine-grained information here to
distinguish between one lock and another is the way to get there.
Personally, I've never run into a problem where I really needed to
know anything more than the tranche name. Like, I've seen problems for
example we can see that there's a lot of contention on
SubtransSLRULock, or there's problems with WALInsertLock. But I can't
really see why I'd need to know which WALInsertLock was experiencing
contention. If we were speaking of buffer content locks, I suppose I
can imagine wanting more details, but it's not really the buffer
number I'd want to know. I'd want to know the database OID, the
relfilenode, the fork number, and the block number. You can argue that
we should just expose the buffer number and let the user sort out the
rest with dtrace/systemtap magic, but that makes it useless in
practice to an awful lot of people, including me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

13 April 2021, 20:46:25

Hi,

On 2021-04-13 14:25:23 -0400, Robert Haas wrote:
> On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
> You could identify every lock by a tranche ID + an array offset + a
> "tranche instance ID". But where would you store the tranche instance
> ID to make it readily accessible, other than in the lock itself?
> Andres wasn't thrilled about using even 2 bytes to identify the
> LWLock, so he'll probably like having more bytes in there for that
> kind of thing even less.

I still don't like the two bytes, fwiw ;). Especially because it's 4
bytes due to padding right now.

I'd like to move the LWLock->waiters list to outside of the lwlock
itself - at most TotalProcs LWLocks can be waited for, so we don't need
millions of empty proclist_heads. That way we could also remove the
proclist indirection - which shows up a fair bit in contended workloads.

And if we had a separate "lwlocks being waited for" structure, we could
also add more information to it if we wanted to...

The difficulty of course is having space to indicate which of these
"waiting for" lists are being used - there's not enough space in ->state
right now to represent that. Two possibile approaches:

- We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
  while I personally think that's a sane upper limit, I already had a
  hard time getting consensus to lower the limit to 2^18-1.

- Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
  well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

It's not so much that I want to lower the overall memory usage (although
it doesn't hurt). It's more about being able to fit more data into one
cacheline together with the lwlock. E.g. being able to fit more into
BufferDesc would be very useful.

A secondary benefit of such an approach would be that it it makes it a
lot easier to add efficient adaptive spinning on contended locks. I did
experiment with that, and there's some considerable potential for
performance benefits there. But for it to scale well we need something
similar to "mcs locks", to avoid causing too much contention. And that
pretty much requires some separate space to store wait information
anyway.

With an 8 bytes state we probably could also stash the tranche inside
that...

> On a broader level, I agree that being able to find out what the
> system is doing is really important. But I'm also not entirely
> convinced that having really fine-grained information here to
> distinguish between one lock and another is the way to get there.
> Personally, I've never run into a problem where I really needed to
> know anything more than the tranche name.

I think it's quite useful for relatively simple things like analyzing
the total amount of time spent in individual locks, without incuring
much overhead when not doing so (for which you need to identify
individual locks, otherwise your end - start time is going to be
meaningless). And, slightly more advanced, for analyzing what the stack
was when the lock was released - which then allows you to see what work
you're blocked on, something I found hard to figure out otherwise.

I found that that's mostly quite doable with dynamic probes though.

> Like, I've seen problems for example we can see that there's a lot of
> contention on SubtransSLRULock, or there's problems with
> WALInsertLock. But I can't really see why I'd need to know which
> WALInsertLock was experiencing contention.

Well, but you might want to know what the task blocking you was
doing. What to optimize might differ if the other task is e.g. a log
switch (which acquires all insert locks), than if it's WAL writes by
VACUUM.

> If we were speaking of buffer content locks, I suppose I can imagine
> wanting more details, but it's not really the buffer number I'd want
> to know. I'd want to know the database OID, the relfilenode, the fork
> number, and the block number. You can argue that we should just expose
> the buffer number and let the user sort out the rest with
> dtrace/systemtap magic, but that makes it useless in practice to an
> awful lot of people, including me.

I have wondered if we ought to put some utilities for that in contrib or
such. It's a lot easier to address something new with a decent starting
point...

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

14 April 2021, 02:23:51

On Wed, 14 Apr 2021 at 04:46, Andres Freund <andres@anarazel.de> wrote:
>
> On 2021-04-13 14:25:23 -0400, Robert Haas wrote:
> > On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
> > You could identify every lock by a tranche ID + an array offset + a
> > "tranche instance ID". But where would you store the tranche instance
> > ID to make it readily accessible, other than in the lock itself?
> > Andres wasn't thrilled about using even 2 bytes to identify the
> > LWLock, so he'll probably like having more bytes in there for that
> > kind of thing even less.
>
> I still don't like the two bytes, fwiw ;). Especially because it's 4
> bytes due to padding right now.

Aha, did I hear you say "there are two free bytes for me to shove
something marginally useful and irrelevant into"?

(*grin*)

> I'd like to move the LWLock->waiters list to outside of the lwlock
> itself - at most TotalProcs LWLocks can be waited for, so we don't need
> millions of empty proclist_heads. That way we could also remove the
> proclist indirection - which shows up a fair bit in contended workloads.
>
> And if we had a separate "lwlocks being waited for" structure, we could
> also add more information to it if we wanted to...

Having the ability to observe LWLock waiters would be nice. But you're
right to constantly emphasise that LWLocks need to be very slim. We
don't want to turn them into near-heavyweight locks by saddling them
with overhead that's not strictly necessary. A simple pg_regress run
(with cassert, admittedly) takes 25,000,000 LWLocks and does 24,000
LWLock waits and 130,000 condvar waits. All in less than a minute.
OTOH, once someone's waiting we don't care nearly as much about
bookkeeping cost, it's the un-contested fast paths that're most
critical.

> - We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
>   while I personally think that's a sane upper limit, I already had a
>   hard time getting consensus to lower the limit to 2^18-1.

16384 backends is totally fine in sane real world deployments. But
it'll probably upset marketing people when OtherDatabaseVendor starts
shouting that they support 14 million connections, and postgres only
has 16k. Sigh.

The real answer here in the long term probably needs to be decoupling
of executors from connection state inside postgres. But while we're on
that topic, how about we convert the entire codebase to Rust while
riding on a flying rainbow unicorn? We're stuck with the 1:1
connection to executor mapping for the foreseeable future.

> - Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
>   well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

You know much more than me about the possible impacts of that on
layout and caching, but I gather that it's probably undesirable to
make LWLocks any bigger.

> I think it's quite useful for relatively simple things like analyzing
> the total amount of time spent in individual locks, without incuring
> much overhead when not doing so (for which you need to identify
> individual locks, otherwise your end - start time is going to be
> meaningless).

Yep.

That's why the removal of the lock offset is a bit frustrating,
because you can't identify an LWLock instance-wide by LWLock* due to
the possibility of different per-backend DSM base-address mappings.
Well, and ASLR on EXEC_BACKEND systems, but who cares about those?

The removal was for good reasons though. And it only affects LWLocks
in DSM, for everything else the LWLock* is good enough. If identifying
LWLocks in DSM ever matters enough to bother to solve that problem, we
can emit trace events on DSM mapping attach in each backend, and trace
tools can do the work to track which LWLocks are in DSM and convert
their addresses to a reference base address. Pg shouldn't have to pay
the price for that unless it's something a lot of people need.

> And, slightly more advanced, for analyzing what the stack
> was when the lock was released - which then allows you to see what work
> you're blocked on, something I found hard to figure out otherwise.
>
> I found that that's mostly quite doable with dynamic probes though.

Yeah, it is.

That's part of why my patchset here doesn't try to do a lot to LWLock
tracepoints - I didn't think it was necessary to add a lot.
The LWLock code is fairly stable, not usually something you have to
worry about in production unless you're debugging badly behaved
extensions, and usually somewhat probe-able with DWARF based dynamic
probes. However, the way the wait-loop and fast-path are in the same
function is a serious pain for dynamic probing; you can't tell the
difference between a fast-path acquire and an acquire after a wait
without using probes on function+offset or probing by source line.
Both those are fine for dev work but useless in tool/library scripts.

I almost wonder if we should test out moving the LWLock wait-loops out
of LWLockAcquire(), LWLockAcquireOrWait() and LWLockWaitForVar()
anyway, so the hot parts of the function are smaller. That'd make
dynamic probing more convenient as a pleasant side effect. I imagine
you must've tried this, benchmarked and profiled it, though, and found
it to be a net loss, otherwise you surely would've done it as part of
your various (awesome) performance work.

Anyway, there are some other areas of postgres that are ridiculously
painful to instrument with dynamic probes, especially in a somewhat
version- and build-independent way. Tracking txn commit and abort
(including 2PC and normal xacts, with capture of commit LSNs) is just
painful with dynamic probing for example, and is one of my top
priority areas to get some decent tracepoints for - the current txn
management tracepoints are utterly worthless. But LWLocks are mostly
fine, the only really big piece missing is a tracepoint fired exactly
once when a lock is released by any release path.

> > Like, I've seen problems for example we can see that there's a lot of
> > contention on SubtransSLRULock, or there's problems with
> > WALInsertLock. But I can't really see why I'd need to know which
> > WALInsertLock was experiencing contention.
>
> Well, but you might want to know what the task blocking you was
> doing. What to optimize might differ if the other task is e.g. a log
> switch (which acquires all insert locks), than if it's WAL writes by
> VACUUM.

That sort of thing is why I've been interested in IDing the LWLock.
That, and I work with extension code that probably abuses LWLocks a
bit, but that's not a problem core postgres should have to care about.

> > If we were speaking of buffer content locks, I suppose I can imagine
> > wanting more details, but it's not really the buffer number I'd want
> > to know. I'd want to know the database OID, the relfilenode, the fork
> > number, and the block number. You can argue that we should just expose
> > the buffer number and let the user sort out the rest with
> > dtrace/systemtap magic, but that makes it useless in practice to an
> > awful lot of people, including me.
>
> I have wondered if we ought to put some utilities for that in contrib or
> such. It's a lot easier to address something new with a decent starting
> point...

Long term that's exactly what I want to do.

I wrote some with systemtap, but it's since become clear to me that
systemtap isn't going to get enough people on board. Setup is a
hassle. So I'm trying to pivot over to bpf tools now, with the
intention of getting together a library of canned probes and example
scripts to help people get started.

I've written systemtap scripts that can track a transaction from
localxid allocation through real xid allocation, commit (or 2pc
prepare and commit prepared), logical decoding, reorder buffering,
output plugin processing, receipt by a logical downstream, downstream
xid assignment, downstream commit and replorigin advance,
downstream->upstream feedback, upstream slot advance, and upstream
confirmed flush/catalog_xmin advance. Whole-lifecycle tracking with
timing of each phase, across multiple processes and two postgres
instances. For now the two postgres instances must be on the same
host, but that could be dealt with too. The script reports the
application name and pid of the upstream session, the upstream
localxid, the upstream xid, upstream commit lsn, the downstream xid
and  the downstream commit lsn as it goes, and follows associations to
track the transaction through its lifecycle. (The current script is
written for BDR and pglogical, not in-core logical decoding, but the
principles are the same).

The problem is that scripts like this are just too fragile right now.
Changes across Pg versions break the dynamic function probes they use,
though that can be adapted to somewhat. The bigger problem is the
number of places I have to insert statement (function+offset) probes,
which are just too fragile to make these sorts of scripts generally
useful. I have to fix them whenever I want to use them, so there's not
much point trying to get people to use them.

But it's hard to convince people of the value of static tracepoints
that would make this sort of thing so much easier to do in a more
stable manner when they can't easily see hands-on examples of what's
possible. There's no "wow" factor. So I need to address the worst of
the difficult-to-probe sites and start sharing some examples that use
them.

I thought this would be low-hanging fruit to start with. Whoops!

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

14 April 2021, 02:41:44

On Wed, 14 Apr 2021 at 02:25, Robert Haas <robertmhaas@gmail.com> wrote:

> So before the commit in question --
> 3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an
> offset for a lock within the tranche that was supposed to uniquely
> identify the lock. However, the whole idea of an array per tranche
> turns out to be broken by design.

Yeah, I understand that.

I'd really love it if a committer could add an explanatory comment or
two in the area though. I'm happy to draft a comment patch if anyone
agrees my suggestion is sensible. The key things I needed to know when
studying the code were:

* A LWLock* is always part of a tranche, but locks within a given
tranche are not necessarily co-located in memory, cross referenced or
associated in any way.
* A LWLock tranche does not track how many LWLocks are in the tranche
or where they are in memory. It only groups up LWLocks into categories
and maps the tranche ID to a name.
* Not all LWLocks are part of the main LWLock array; others can be
embedded in shmem structs elsewhere, including in DSM segments.
* LWLocks in DSM segments may not have the same address between
different backends, because the DSM base address can vary, so a
LWLock* cannot be reliably compared between backends unless you know
it's in the main LWLock array or in static shmem.

Having that info in lwlock.c near the tranche management code or the
tranche and main lwlock arrays would've been very handy.

> You could try to identify locks by pointer addresses, but that's got
> security hazards and the addreses aren't portable across all the
> backends involved in the parallel query because of how DSM works, so
> it's not really that helpful in terms of matching stuff up.

What I'm doing now is identifying them by LWLock* across backends. I
keep track of DSM segment mappings in each backend inside the trace
script and I relocate LWLock* pointers known to be inside DSM segments
relative to a dummy base address so they're equal across backends.

It's a real pain, but it works. The main downside is that the trace
script has to observe the DSM attach; if it's started once a backend
already has the DSM segment attached, it has no idea the LWLock is in
a DSM segment or how to remap it. But that's not a serious issue.

> On a broader level, I agree that being able to find out what the
> system is doing is really important. But I'm also not entirely
> convinced that having really fine-grained information here to
> distinguish between one lock and another is the way to get there.

At the start of this thread I would've disagreed with you.

But yeah, you and Andres are right, because the costs outweigh the
benefits here.

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument. At least for the
fast-path tracepoints on start-of-acquire and end-of-acquire. I think
it's probably OK to report it in the lock wait tracepoints, which is
where it's most important to have anyway. So the tracepoint will
always report the LWLock* and tranche ID, but it won't report the
tranche name for the fast-path. I'll add trace events for tranche ID
registration, so trace tools can either remember the tranche ID->name
mappings from there, or capture them from lock wait events and
remember them.

Reasonable? That way we retain the most important trace functionality,
but we reduce the overheads.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

14 April 2021, 02:45:16

On Wed, 14 Apr 2021 at 10:41, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> On Wed, 14 Apr 2021 at 02:25, Robert Haas <robertmhaas@gmail.com> wrote:
> > You could try to identify locks by pointer addresses, but that's got
> > security hazards and the addreses aren't portable across all the
> > backends involved in the parallel query because of how DSM works, so
> > it's not really that helpful in terms of matching stuff up.
>
> What I'm doing now is identifying them by LWLock* across backends. I
> keep track of DSM segment mappings in each backend inside the trace
> script and I relocate LWLock* pointers known to be inside DSM segments
> relative to a dummy base address so they're equal across backends.

BTW, one of the reasons I did this was to try to identify BDR and
pglogical code that blocks or sleeps while holding a LWLock. I got
stuck on that for other reasons, so it didn't go anywhere, but those
issues are now resolved so I should probably return to it at some
point.

It'd be a nice thing to be able to run on postgres itself too.

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

14 April 2021, 13:20:59

On 12.04.21 07:46, Craig Ringer wrote:
>      > To use systemtap semaphores (the _ENABLED macros) you need to run
>     dtrace
>      > -g to generate a probes.o then link that into postgres.
>      >
>      > I don't think we do that. I'll double check soon.
> 
>     We do that.  (It's -G.)
> 
> 
> Huh. I could've sworn we didn't. My mistake, it's there in 
> src/backend/Makefile .
> 
> In that case I'll amend the patch to use semaphore guards.

This whole thread is now obviously moved to consideration for PG15, but 
I did add an open item about this particular issue 
(https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items, search for 
"dtrace").  So if you could produce a separate patch that adds the 
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

Re: [PATCH] Identify LWLocks in tracepoints

From

Robert Haas

Date:

14 April 2021, 14:27:11

On Tue, Apr 13, 2021 at 4:46 PM Andres Freund <andres@anarazel.de> wrote:
> I still don't like the two bytes, fwiw ;). Especially because it's 4
> bytes due to padding right now.

I'm not surprised by that disclosure. But I think it's entirely worth
it. Making wait states visible in pg_stat_activity isn't the most
useful thing I've ever done to PostgreSQL, but it's far from the least
useful. If we can get the same benefit with less overhead, that's
great.

> I'd like to move the LWLock->waiters list to outside of the lwlock
> itself - at most TotalProcs LWLocks can be waited for, so we don't need
> millions of empty proclist_heads. That way we could also remove the
> proclist indirection - which shows up a fair bit in contended workloads.
>
> And if we had a separate "lwlocks being waited for" structure, we could
> also add more information to it if we wanted to...
>
> The difficulty of course is having space to indicate which of these
> "waiting for" lists are being used - there's not enough space in ->state
> right now to represent that. Two possibile approaches:
>
> - We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
>   while I personally think that's a sane upper limit, I already had a
>   hard time getting consensus to lower the limit to 2^18-1.
>
> - Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
>   well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

I'd rather not further reduce MAX_BACKENDS. I still think some day
we're going to want to make that bigger again. Maybe not for a while,
admittedly. But, do you need to fit this into "state"? If you just
replaced "waiters" with a 32-bit integer, you'd save 4 bytes and have
bits left over (and maybe restrict the tranche ID to 2^14 and squeeze
that in too, as you mention).

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Identify LWLocks in tracepoints

From

Robert Haas

Date:

14 April 2021, 14:28:59

On Tue, Apr 13, 2021 at 10:42 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> I'd really love it if a committer could add an explanatory comment or
> two in the area though. I'm happy to draft a comment patch if anyone
> agrees my suggestion is sensible. The key things I needed to know when
> studying the code were:
>
> * A LWLock* is always part of a tranche, but locks within a given
> tranche are not necessarily co-located in memory, cross referenced or
> associated in any way.
> * A LWLock tranche does not track how many LWLocks are in the tranche
> or where they are in memory. It only groups up LWLocks into categories
> and maps the tranche ID to a name.
> * Not all LWLocks are part of the main LWLock array; others can be
> embedded in shmem structs elsewhere, including in DSM segments.
> * LWLocks in DSM segments may not have the same address between
> different backends, because the DSM base address can vary, so a
> LWLock* cannot be reliably compared between backends unless you know
> it's in the main LWLock array or in static shmem.
>
> Having that info in lwlock.c near the tranche management code or the
> tranche and main lwlock arrays would've been very handy.

I'm willing to review a comment patch along those lines.

> I'm actually inclined to revise the patch I sent in order to *remove*
> the LWLock name from the tracepoint argument. At least for the
> fast-path tracepoints on start-of-acquire and end-of-acquire. I think
> it's probably OK to report it in the lock wait tracepoints, which is
> where it's most important to have anyway. So the tracepoint will
> always report the LWLock* and tranche ID, but it won't report the
> tranche name for the fast-path. I'll add trace events for tranche ID
> registration, so trace tools can either remember the tranche ID->name
> mappings from there, or capture them from lock wait events and
> remember them.
>
> Reasonable? That way we retain the most important trace functionality,
> but we reduce the overheads.

Reducing the overheads is good, but I have no opinion on what's
important for people doing tracing, because I am not one of those
people.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

29 April 2021, 07:31:54

On 14.04.21 15:20, Peter Eisentraut wrote:
> On 12.04.21 07:46, Craig Ringer wrote:
>>      > To use systemtap semaphores (the _ENABLED macros) you need to run
>>     dtrace
>>      > -g to generate a probes.o then link that into postgres.
>>      >
>>      > I don't think we do that. I'll double check soon.
>>
>>     We do that.  (It's -G.)
>>
>>
>> Huh. I could've sworn we didn't. My mistake, it's there in 
>> src/backend/Makefile .
>>
>> In that case I'll amend the patch to use semaphore guards.
> 
> This whole thread is now obviously moved to consideration for PG15, but 
> I did add an open item about this particular issue 
> (https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items, search for 
> "dtrace").  So if you could produce a separate patch that adds the 
> _ENABLED guards targeting PG14 (and PG13), that would be helpful.

Here is a proposed patch for this.

Attachment

0001-Prevent-lwlock-dtrace-probes-from-unnecessary-work.patch

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

30 April 2021, 03:22:55

On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> > So if you could produce a separate patch that adds the
> > _ENABLED guards targeting PG14 (and PG13), that would be helpful.
>
> Here is a proposed patch for this.

LGTM.

Applies and builds fine on master and (with default fuzz) on
REL_13_STABLE. Works as expected.

This does increase the size of LWLockAcquire() etc slightly but since
it skips these function calls, and the semaphores are easily
predicted, I don't have any doubt it's a net win. +1 for merge.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

30 April 2021, 03:23:56

On Wed, 14 Apr 2021, 22:29 Robert Haas, <robertmhaas@gmail.com> wrote:

On Tue, Apr 13, 2021 at 10:42 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> I'd really love it if a committer could add an explanatory comment or
> two in the area though. I'm happy to draft a comment patch if anyone
> agrees my suggestion is sensible. The key things I needed to know when
> studying the code were:
> [...]

I'm willing to review a comment patch along those lines.

Cool. I'll draft soon.

I since noticed that some of the info is present, but it's in lwlock.h whereas in Pg comment detail is more often than not in the .c file.

I prefer it in headers myself anyway, since it's more available to tools like doxygen. I'll add a few "see lwlock.h" hints, a short para about appropriate lwlock use in the .c into comment etc and post on a separate thread soon.

> I'm actually inclined to revise the patch I sent in order to *remove*
> the LWLock name from the tracepoint argument.

Reducing the overheads is good, but I have no opinion on what's
important for people doing tracing, because I am not one of those
people.

Truthfully I'm not convinced anyone is "those people" right now. I don't think anyone is likely to be making serious use of them due to their limitations.

Certainly that'll be the case for the txn ones which are almost totally useless. They only track the localxid lifecycle, they don't track real txid allocation, WAL writing, commit (wal or shmem), etc.

Re: [PATCH] Identify LWLocks in tracepoints

From

Dmitry Dolgov

Date:

01 May 2021, 16:58:41

> On Fri, Apr 30, 2021 at 11:23:56AM +0800, Craig Ringer wrote:
> On Wed, 14 Apr 2021, 22:29 Robert Haas, <robertmhaas@gmail.com> wrote:
> 
> > > I'm actually inclined to revise the patch I sent in order to *remove*
> > > the LWLock name from the tracepoint argument.
> 
> > Reducing the overheads is good, but I have no opinion on what's
> > important for people doing tracing, because I am not one of those
> > people.
> >
> 
> Truthfully I'm not convinced anyone is "those people" right now. I don't
> think anyone is likely to be making serious use of them due to their
> limitations.

I would like to mention that tracepoints could be useful not only directly,
they also:

* deliver an information about what is important enough to trace from the
  developers, who wrote the code, point of view.

* declare more stable tracing points within the code, which are somewhat more
  reliable between the versions.

E.g. writing bcc scripts one is also sort of limited in use of those
tracepoints because of requirement to provide a specific pid, but still can get
better understanding what to look at (maybe using other methods).

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

03 May 2021, 19:06:30

On 30.04.21 05:22, Craig Ringer wrote:
> On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>> So if you could produce a separate patch that adds the
>>> _ENABLED guards targeting PG14 (and PG13), that would be helpful.
>>
>> Here is a proposed patch for this.
> 
> LGTM.
> 
> Applies and builds fine on master and (with default fuzz) on
> REL_13_STABLE. Works as expected.

committed

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

04 May 2021, 22:15:31

Hi,

On 2021-05-03 21:06:30 +0200, Peter Eisentraut wrote:
> On 30.04.21 05:22, Craig Ringer wrote:
> > On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
> > <peter.eisentraut@enterprisedb.com> wrote:
> > > > So if you could produce a separate patch that adds the
> > > > _ENABLED guards targeting PG14 (and PG13), that would be helpful.
> > > 
> > > Here is a proposed patch for this.
> > 
> > LGTM.
> > 
> > Applies and builds fine on master and (with default fuzz) on
> > REL_13_STABLE. Works as expected.
> 
> committed

I'm now getting

/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
      |                                                          ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1345:57: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1345 |    TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
      |                                                         ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1355:54: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1355 |   TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
      |                                                      ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockConditionalAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1407:64: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1407 |    TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
      |                                                                ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1415:59: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1415 |    TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
      |                                                           ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquireOrWait’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1488:59: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1488 |     TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
      |                                                           ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1507:58: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1507 |     TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
      |                                                          ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1538:68: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1538 |    TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
      |                                                                    ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1547:63: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1547 |    TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
      |                                                               ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockWaitForVar’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1708:66: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1708 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
      |                                                                  ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1728:65: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1728 |    TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
      |                                                                 ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockRelease’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1855:48: warning: suggest braces around empty body in an
‘if’statement [-Wempty-body]
 
 1855 |   TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));

In a build without the trace stuff enabled.

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

05 May 2021, 01:15:11

On Wed, 5 May 2021, 06:15 Andres Freund, <andres@anarazel.de> wrote:

Hi,

warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

Odd that I didn't get that.

I'll send a patch to revise shortly.

Re: [PATCH] Identify LWLocks in tracepoints

From

Craig Ringer

Date:

05 May 2021, 04:20:04

On Wed, 5 May 2021 at 09:15, Craig Ringer <craig.ringer@enterprisedb.com> wrote:

>> warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
>>  1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
>>       |                                                          ^
>
> Odd that I didn't get that.

This compiler complaint is not due to the _ENABLED() test as such.
It's due to the fact that we completely define out the
TRACE_POSTGRESQL_ macros with src/backend/utils/Gen_dummy_probes.sed .

While explicit braces could be added around each test, I suggest
fixing Gen_dummy_probes.sed to emit the usual dummy statement instead.
Patch attached.

Attachment

v1-0001-Emit-dummy-statements-for-probes.d-probes-when-di.patch

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

08 May 2021, 15:04:01

On 05.05.21 00:15, Andres Freund wrote:
> I'm now getting
> 
> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in
an‘if’ statement [-Wempty-body]

>   1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
>        |                                                          ^

For clarification, -Wempty-body is not part of the default warnings, right?

And even if I turn it on explicitly, I don't get this warning.  I read 
something that newer compilers don't warn in cases of macro expansion.

What compiler are you using in this situation?

Re: [PATCH] Identify LWLocks in tracepoints

From

Tom Lane

Date:

08 May 2021, 17:13:47

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 05.05.21 00:15, Andres Freund wrote:
>> I'm now getting
>> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
>> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in
an‘if’ statement [-Wempty-body] 
>> 1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
>> |                                                          ^

> What compiler are you using in this situation?

All of these buildfarm members are now showing this warning:

calliphoridae    gcc (Debian 10.1.0-6) 10.1.0
culicidae    gcc (Debian 10.1.0-6) 10.1.0
flaviventris    gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
francolin    gcc (Debian 10.1.0-6) 10.1.0
piculetœ    gcc (Debian 10.1.0-6) 10.1.0
rorqual        gcc (Debian 10.1.0-6) 10.1.0
serinus        gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
skink        gcc (Debian 10.1.0-6) 10.1.0

so there's your answer.

(I wonder why flaviventris and serinus are still using an "experimental"
compiler version that is now behind mainstream.)

            regards, tom lane

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

09 May 2021, 18:55:04

Hi,

On 2021-05-08 13:13:47 -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > On 05.05.21 00:15, Andres Freund wrote:
> >> I'm now getting
> >> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
> >> /home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body
inan ‘if’ statement [-Wempty-body]
 
> >> 1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
> >> |                                                          ^
> 
> > What compiler are you using in this situation?

gcc - I think the warning is pulled in via -Wextra.  I think it's
something sensible to warn about, too easy to end up with misleading
behaviour when statement-like macros are defined empty.


> All of these buildfarm members are now showing this warning:
> 
> calliphoridae    gcc (Debian 10.1.0-6) 10.1.0
> culicidae    gcc (Debian 10.1.0-6) 10.1.0
> flaviventris    gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
> francolin    gcc (Debian 10.1.0-6) 10.1.0
> piculetœ    gcc (Debian 10.1.0-6) 10.1.0
> rorqual        gcc (Debian 10.1.0-6) 10.1.0
> serinus        gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
> skink        gcc (Debian 10.1.0-6) 10.1.0

I think those likely are all mine, so it's not too surprising. They all
use something like
CFLAGS => '-Og -ggdb -g3 -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-missing-field-initializers
-fno-omit-frame-pointer',


> (I wonder why flaviventris and serinus are still using an "experimental"
> compiler version that is now behind mainstream.)

The upgrade script didn't install the newer version it because it had to
remove some conflicting packages... Should be fixed for runs starting
now.

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Tom Lane

Date:

09 May 2021, 23:51:13

Andres Freund <andres@anarazel.de> writes:
> On 2021-05-08 13:13:47 -0400, Tom Lane wrote:
>> (I wonder why flaviventris and serinus are still using an "experimental"
>> compiler version that is now behind mainstream.)

> The upgrade script didn't install the newer version it because it had to
> remove some conflicting packages... Should be fixed for runs starting
> now.

Looks like that didn't work ...

            regards, tom lane

Re: [PATCH] Identify LWLocks in tracepoints

From

Peter Eisentraut

Date:

10 May 2021, 11:59:16

On 05.05.21 06:20, Craig Ringer wrote:
> On Wed, 5 May 2021 at 09:15, Craig Ringer <craig.ringer@enterprisedb.com> wrote:
> 
>>> warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
>>>   1322 |    TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
>>>        |                                                          ^
>>
>> Odd that I didn't get that.
> 
> This compiler complaint is not due to the _ENABLED() test as such.
> It's due to the fact that we completely define out the
> TRACE_POSTGRESQL_ macros with src/backend/utils/Gen_dummy_probes.sed .
> 
> While explicit braces could be added around each test, I suggest
> fixing Gen_dummy_probes.sed to emit the usual dummy statement instead.
> Patch attached.

Committed, with the Gen_dummy_probes.pl change added.

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

10 May 2021, 16:08:22

On 2021-05-09 19:51:13 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-05-08 13:13:47 -0400, Tom Lane wrote:
> >> (I wonder why flaviventris and serinus are still using an "experimental"
> >> compiler version that is now behind mainstream.)
> 
> > The upgrade script didn't install the newer version it because it had to
> > remove some conflicting packages... Should be fixed for runs starting
> > now.
> 
> Looks like that didn't work ...

Looks like it did, but turned out to have some unintended side-effects
:(.

The snapshot builds are now new:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure
configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision
fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]

But the aforementioned dependencies that needed to remove broke the
installed old versions of gcc/clang.

I started to build the old versions of llvm manually, but that then hits
the issue that at least 3.9 doesn't build with halfway modern versions
of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
n-2 with n-1), will take a bit of time.

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Tom Lane

Date:

10 May 2021, 16:14:46

Andres Freund <andres@anarazel.de> writes:
> Looks like it did, but turned out to have some unintended side-effects
> :(.
> The snapshot builds are now new:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure
> configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
> gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision
fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]
> But the aforementioned dependencies that needed to remove broke the
> installed old versions of gcc/clang.
> I started to build the old versions of llvm manually, but that then hits
> the issue that at least 3.9 doesn't build with halfway modern versions
> of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
> n-2 with n-1), will take a bit of time.

Ugh.  Memo to self: don't rag on other peoples' buildfarm configurations
right before a release deadline :-(.  Sorry to cause you trouble.

            regards, tom lane

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

10 May 2021, 16:46:02

Hi,

On 2021-05-10 12:14:46 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Looks like it did, but turned out to have some unintended side-effects
> > :(.
> > The snapshot builds are now new:
> >
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure
> > configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
> > gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision
fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]
> > But the aforementioned dependencies that needed to remove broke the
> > installed old versions of gcc/clang.
> > I started to build the old versions of llvm manually, but that then hits
> > the issue that at least 3.9 doesn't build with halfway modern versions
> > of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
> > n-2 with n-1), will take a bit of time.
> 
> Ugh.  Memo to self: don't rag on other peoples' buildfarm configurations
> right before a release deadline :-(.  Sorry to cause you trouble.

No worries - I knew that I'd have to do this at some point, even though
I hadn't planned to do that today... I should have all of them green
before end of today.

I found that I actually can build LLVM 3.9 directly, as clang-6 can
still build it directly (wheras the oldest gcc still installed can't
build it directly). So it's a bit less painful than I thought at first

The 3.9 instances (phycodurus, dragonet) tests are running right now,
and I'm fairly sure they'll pass (most of a --noreport --nostatus run
passed). Going forward building LLVM 4,5,6 now - the later versions take
longer...

Greetings,

Andres Freund

Re: [PATCH] Identify LWLocks in tracepoints

From

Andres Freund

Date:

11 May 2021, 17:35:42

Hi,

On 2021-05-10 09:46:02 -0700, Andres Freund wrote:
> No worries - I knew that I'd have to do this at some point, even though
> I hadn't planned to do that today... I should have all of them green
> before end of today.
> 
> I found that I actually can build LLVM 3.9 directly, as clang-6 can
> still build it directly (wheras the oldest gcc still installed can't
> build it directly). So it's a bit less painful than I thought at first
> 
> The 3.9 instances (phycodurus, dragonet) tests are running right now,
> and I'm fairly sure they'll pass (most of a --noreport --nostatus run
> passed). Going forward building LLVM 4,5,6 now - the later versions take
> longer...

Looks like it's all clear now. All but the results for 11 had cleared up
until yesterday evening, and the rest came in ok over night.

Greetings,

Andres Freund