Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Syncrep and improving latency due to WAL throttling |
Date | |
Msg-id | 20230126154912.okzh7psxox6aphdm@awork3.anarazel.de Whole thread Raw |
In response to | Re: Syncrep and improving latency due to WAL throttling (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
Responses |
Re: Syncrep and improving latency due to WAL throttling
|
List | pgsql-hackers |
Hi, On 2023-01-26 14:40:56 +0100, Jakub Wartak wrote: > In summary: Attached is a slightly reworked version of this patch. > 1. Moved logic outside XLogInsertRecord() under ProcessInterrupts() > 2. Flushes up to the last page boundary, still uses SyncRepWaitForLSN() > 3. Removed GUC for now (always on->256kB); potentially to be restored Huh? Why did you remove the GUC? Clearly this isn't something we can default to on. > diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c > index d85e313908..05d56d65f9 100644 > --- a/src/backend/access/transam/xact.c > +++ b/src/backend/access/transam/xact.c > @@ -2395,6 +2395,7 @@ CommitTransaction(void) > > XactTopFullTransactionId = InvalidFullTransactionId; > nParallelCurrentXids = 0; > + backendWalInserted = 0; > > /* > * done with commit processing, set current transaction state back to I don't like the resets around like this. Why not just move it into XLogFlush()? > +/* > + * Called from ProcessMessageInterrupts() to avoid waiting while being in critical section > + */ > +void HandleXLogDelayPending() > +{ > + /* flush only up to the last fully filled page */ > + XLogRecPtr LastFullyWrittenXLogPage = XactLastRecEnd - (XactLastRecEnd % XLOG_BLCKSZ); > + XLogDelayPending = false; Hm - wonder if it'd be better to determine the LSN to wait for in XLogInsertRecord(). There'll be plenty cases where we'll insert multiple (but bounded number of) WAL records before processing interrupts. No need to flush more aggressively than necessary... > + //HOLD_INTERRUPTS(); > + > + /* XXX Debug for now */ > + elog(WARNING, "throttling WAL down on this session (backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)", > + backendWalInserted, > + LSN_FORMAT_ARGS(XactLastRecEnd), > + LSN_FORMAT_ARGS(LastFullyWrittenXLogPage)); > + > + /* XXX: refactor SyncRepWaitForLSN() to have different waitevent than default WAIT_EVENT_SYNC_REP */ > + /* maybe new WAIT_EVENT_SYNC_REP_BIG or something like that */ > + XLogFlush(LastFullyWrittenXLogPage); > + SyncRepWaitForLSN(LastFullyWrittenXLogPage, false); SyncRepWaitForLSN() has this comment: * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN * represents a commit record. If it doesn't, then we wait only for the WAL * to be flushed if synchronous_commit is set to the higher level of * remote_apply, because only commit records provide apply feedback. > + elog(WARNING, "throttling WAL down on this session - end"); > + backendWalInserted = 0; > + > + //RESUME_INTERRUPTS(); > +} I think we'd want a distinct wait event for this. > diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c > index 1b1d814254..8ed66b2eae 100644 > --- a/src/backend/utils/init/globals.c > +++ b/src/backend/utils/init/globals.c > @@ -37,6 +37,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false; > volatile sig_atomic_t ProcSignalBarrierPending = false; > volatile sig_atomic_t LogMemoryContextPending = false; > volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false; > +volatile sig_atomic_t XLogDelayPending = false; > volatile uint32 InterruptHoldoffCount = 0; > volatile uint32 QueryCancelHoldoffCount = 0; > volatile uint32 CritSectionCount = 0; I don't think the new variable needs to be volatile, or even sig_atomic_t. We'll just manipulate it from outside signal handlers. Greetings, Andres Freund
pgsql-hackers by date: