Re: Add LWLock blocker(s) information - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Add LWLock blocker(s) information
Date
Msg-id CA+Tgmobjva3roo5tO16ii_D_9VnPxWWhiW=K_WakbKcQ0yqkdA@mail.gmail.com
Whole thread Raw
In response to Add LWLock blocker(s) information  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Add LWLock blocker(s) information
List pgsql-hackers
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> the patch adds into the LWLock struct:
>
>                     last_holding_pid: last pid owner of the lock
>                     last_mode: last holding mode of the last pid owner of the lock
>                     nholders: number of holders (could be >1 in case of LW_SHARED)

There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.

It's also a little hard to believe that this doesn't hurt performance
on workloads with a lot of LWLock contention, although maybe not; it
doesn't seem crazy expensive, just possibly enough to matter.

I thought a little bit about what this might buy as compared with just
sampling wait events. That by itself is enough to tell you which
LWLocks are heavily contended. It doesn't tell you what they are
contending against, so this would be superior in that regard. However,
I wonder how much of a problem that actually is. Typically, LWLocks
aren't being taken for long periods, so all the things that are
accessing the lock spend some time waiting (which you will see via
wait events in pg_stat_activity) and some time holding the lock
(making you see other things in pg_stat_activity). It's possible to
have cases where this isn't true; e.g. a relatively small number of
backends committing transactions could be slowing down a much larger
number of backends taking snapshots, and you'd mostly only see the
latter waiting for ProcArrayLock. However, those kinds of cases don't
seem super-common or super-difficult to figure out.

What kinds of scenarios motivate you to propose this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Terminate the idle sessions
Next
From: Alvaro Herrera
Date:
Subject: massive FPI_FOR_HINT load after promote