Report distinct wait events when waiting for WAL "operation" - Mailing list pgsql-hackers

From Andres Freund
Subject Report distinct wait events when waiting for WAL "operation"
Date
Msg-id 20230717165544.4hdlynfv3jylnlm4@awork3.anarazel.de
Whole thread Raw
Responses Re: Report distinct wait events when waiting for WAL "operation"
Re: Report distinct wait events when waiting for WAL "operation"
List pgsql-hackers
Hi,

In a number of workloads one can see two wait events prominently:
LWLock:WALWrite and LWLock:WALInsert. Unfortunately for both that is not very
informative:

LWLock:WALWrite can be reported because there is genuine contention on the
LWLock, or, more commonly, because several backends are waiting for another to
finish IO. In the latter case we are not actually waiting to acquire the lock,
we are waiting for the lock to be released *without* then acquiring it.

LWLock:WALInsert can be reported because there are not enough WALInsert locks
(c.f. NUM_XLOGINSERT_LOCKS) or because we are waiting for another backend to
finish copying a WAL record into wal_buffers.  In the latter case we are
therefore not waiting to acquire an LWLock.


I think both of these cases are relevant to distinguish from an operational
perspective. Secondarily, I've received many questions about making those
locks more scalable / granular, when in most of the cases the issue was not
actual lock contention.

Today it's surprisingly hard to figure out whether the issue is lock
contention or the speed of copying buffers for WAL insert locks / computing
the last prc of the CRC checksum.


Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
use the "generic" LWLockReportWaitStart(), but use caller provided wait
events.  The attached patch adds two new wait events for the existing callers.

I waffled a bit about which wait event section to add these to. Ended up with
"IPC", but would be happy to change that.

WAIT_EVENT_WAL_WAIT_INSERT  WALWaitInsert   "Waiting for WAL record to be copied into buffers."
WAIT_EVENT_WAL_WAIT_WRITE   WALWaitWrite    "Waiting for WAL buffers to be written or flushed to disk."


Previously it was e.g. not really possible to distinguish that something like
this:

┌────────────────┬─────────────────┬────────────┬───────┐
│  backend_type  │ wait_event_type │ wait_event │ count │
├────────────────┼─────────────────┼────────────┼───────┤
│ client backend │ LWLock          │ WALInsert  │    32 │
│ client backend │ (null)          │ (null)     │     9 │
│ walwriter      │ IO              │ WALWrite   │     1 │
│ client backend │ Client          │ ClientRead │     1 │
│ client backend │ LWLock          │ WALWrite   │     1 │
└────────────────┴─────────────────┴────────────┴───────┘

is a workload with a very different bottleneck than this:

┌────────────────┬─────────────────┬───────────────┬───────┐
│  backend_type  │ wait_event_type │  wait_event   │ count │
├────────────────┼─────────────────┼───────────────┼───────┤
│ client backend │ IPC             │ WALWaitInsert │    22 │
│ client backend │ LWLock          │ WALInsert     │    13 │
│ client backend │ LWLock          │ WALBufMapping │     5 │
│ walwriter      │ (null)          │ (null)        │     1 │
│ client backend │ Client          │ ClientRead    │     1 │
│ client backend │ (null)          │ (null)        │     1 │
└────────────────┴─────────────────┴───────────────┴───────┘

even though they are very different

FWIW, the former is bottlenecked by the number of WAL insertion locks, the
second is bottlenecked by copying WAL into buffers due to needing to flush
them.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: generic plans and "initial" pruning
Next
From: Garrett Thornburg
Date:
Subject: Re: Looking for context around which event triggers are permitted