Thread: Report distinct wait events when waiting for WAL "operation"

Report distinct wait events when waiting for WAL "operation"

From
Andres Freund
Date:
Hi,

In a number of workloads one can see two wait events prominently:
LWLock:WALWrite and LWLock:WALInsert. Unfortunately for both that is not very
informative:

LWLock:WALWrite can be reported because there is genuine contention on the
LWLock, or, more commonly, because several backends are waiting for another to
finish IO. In the latter case we are not actually waiting to acquire the lock,
we are waiting for the lock to be released *without* then acquiring it.

LWLock:WALInsert can be reported because there are not enough WALInsert locks
(c.f. NUM_XLOGINSERT_LOCKS) or because we are waiting for another backend to
finish copying a WAL record into wal_buffers.  In the latter case we are
therefore not waiting to acquire an LWLock.


I think both of these cases are relevant to distinguish from an operational
perspective. Secondarily, I've received many questions about making those
locks more scalable / granular, when in most of the cases the issue was not
actual lock contention.

Today it's surprisingly hard to figure out whether the issue is lock
contention or the speed of copying buffers for WAL insert locks / computing
the last prc of the CRC checksum.


Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
use the "generic" LWLockReportWaitStart(), but use caller provided wait
events.  The attached patch adds two new wait events for the existing callers.

I waffled a bit about which wait event section to add these to. Ended up with
"IPC", but would be happy to change that.

WAIT_EVENT_WAL_WAIT_INSERT  WALWaitInsert   "Waiting for WAL record to be copied into buffers."
WAIT_EVENT_WAL_WAIT_WRITE   WALWaitWrite    "Waiting for WAL buffers to be written or flushed to disk."


Previously it was e.g. not really possible to distinguish that something like
this:

┌────────────────┬─────────────────┬────────────┬───────┐
│  backend_type  │ wait_event_type │ wait_event │ count │
├────────────────┼─────────────────┼────────────┼───────┤
│ client backend │ LWLock          │ WALInsert  │    32 │
│ client backend │ (null)          │ (null)     │     9 │
│ walwriter      │ IO              │ WALWrite   │     1 │
│ client backend │ Client          │ ClientRead │     1 │
│ client backend │ LWLock          │ WALWrite   │     1 │
└────────────────┴─────────────────┴────────────┴───────┘

is a workload with a very different bottleneck than this:

┌────────────────┬─────────────────┬───────────────┬───────┐
│  backend_type  │ wait_event_type │  wait_event   │ count │
├────────────────┼─────────────────┼───────────────┼───────┤
│ client backend │ IPC             │ WALWaitInsert │    22 │
│ client backend │ LWLock          │ WALInsert     │    13 │
│ client backend │ LWLock          │ WALBufMapping │     5 │
│ walwriter      │ (null)          │ (null)        │     1 │
│ client backend │ Client          │ ClientRead    │     1 │
│ client backend │ (null)          │ (null)        │     1 │
└────────────────┴─────────────────┴───────────────┴───────┘

even though they are very different

FWIW, the former is bottlenecked by the number of WAL insertion locks, the
second is bottlenecked by copying WAL into buffers due to needing to flush
them.

Greetings,

Andres Freund

Attachment

Re: Report distinct wait events when waiting for WAL "operation"

From
Amit Kapila
Date:
On Mon, Jul 17, 2023 at 10:26 PM Andres Freund <andres@anarazel.de> wrote:
>
> Previously it was e.g. not really possible to distinguish that something like
> this:
>
> ┌────────────────┬─────────────────┬────────────┬───────┐
> │  backend_type  │ wait_event_type │ wait_event │ count │
> ├────────────────┼─────────────────┼────────────┼───────┤
> │ client backend │ LWLock          │ WALInsert  │    32 │
> │ client backend │ (null)          │ (null)     │     9 │
> │ walwriter      │ IO              │ WALWrite   │     1 │
> │ client backend │ Client          │ ClientRead │     1 │
> │ client backend │ LWLock          │ WALWrite   │     1 │
> └────────────────┴─────────────────┴────────────┴───────┘
>
> is a workload with a very different bottleneck than this:
>
> ┌────────────────┬─────────────────┬───────────────┬───────┐
> │  backend_type  │ wait_event_type │  wait_event   │ count │
> ├────────────────┼─────────────────┼───────────────┼───────┤
> │ client backend │ IPC             │ WALWaitInsert │    22 │
> │ client backend │ LWLock          │ WALInsert     │    13 │
> │ client backend │ LWLock          │ WALBufMapping │     5 │
> │ walwriter      │ (null)          │ (null)        │     1 │
> │ client backend │ Client          │ ClientRead    │     1 │
> │ client backend │ (null)          │ (null)        │     1 │
> └────────────────┴─────────────────┴───────────────┴───────┘
>
> even though they are very different
>
> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
> second is bottlenecked by copying WAL into buffers due to needing to flush
> them.
>

This gives a better idea of what's going on. +1 for separating these waits.

-- 
With Regards,
Amit Kapila.

Re: Report distinct wait events when waiting for WAL "operation"

From
Michael Paquier
Date:
On Wed, Jul 19, 2023 at 06:49:57PM +0530, Amit Kapila wrote:
> On Mon, Jul 17, 2023 at 10:26 PM Andres Freund <andres@anarazel.de> wrote:
>> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
>> second is bottlenecked by copying WAL into buffers due to needing to flush
>> them.
>
> This gives a better idea of what's going on. +1 for separating these waits.

+ * As this is not used to wait for lwlocks themselves, the caller has to
+ * provide a wait event to be reported.
  */
 bool
-LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
+LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval,
+                uint32 wait_event_info)

Makes sense to me to do this split, nice!  And this gives more
flexibility for out-of-core callers, while on it.
--
Michael

Attachment

Re: Report distinct wait events when waiting for WAL "operation"

From
Bharath Rupireddy
Date:
On Mon, Jul 17, 2023 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
> use the "generic" LWLockReportWaitStart(), but use caller provided wait
> events.  The attached patch adds two new wait events for the existing callers.

+1 for having separate wait events for WAL insert lock acquire and
wait for WAL insertions to finish. However, I don't think we need to
pass wait events to LWLockAcquireOrWait and LWLockWaitForVar, we can
just use wait events directly in the functions. Because these two
functions are used for acquiring WAL insert lock and waiting for WAL
insertions to finish, they aren't multipurpose functions.

> I waffled a bit about which wait event section to add these to. Ended up with
> "IPC", but would be happy to change that.
>
> WAIT_EVENT_WAL_WAIT_INSERT  WALWaitInsert   "Waiting for WAL record to be copied into buffers."
> WAIT_EVENT_WAL_WAIT_WRITE   WALWaitWrite    "Waiting for WAL buffers to be written or flushed to disk."

IPC seems okay to me. If not, how about the PG_WAIT_LWLOCK event
class? Or, have WAIT_EVENT_WAL_WAIT_WRITE  under PG_WAIT_IO and the
other under PG_WAIT_IPC?

> ┌────────────────┬─────────────────┬───────────────┬───────┐
> │  backend_type  │ wait_event_type │  wait_event   │ count │
> ├────────────────┼─────────────────┼───────────────┼───────┤
> │ client backend │ IPC             │ WALWaitInsert │    22 │
> │ client backend │ LWLock          │ WALInsert     │    13 │
> │ client backend │ LWLock          │ WALBufMapping │     5 │
> │ walwriter      │ (null)          │ (null)        │     1 │
> │ client backend │ Client          │ ClientRead    │     1 │
> │ client backend │ (null)          │ (null)        │     1 │
> └────────────────┴─────────────────┴───────────────┴───────┘
>
> even though they are very different
>
> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
> second is bottlenecked by copying WAL into buffers due to needing to flush
> them.

This separation looks clean and gives much more info.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com