Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Syncrep and improving latency due to WAL throttling
Date
Msg-id 20230126154912.okzh7psxox6aphdm@awork3.anarazel.de
Whole thread Raw
In response to Re: Syncrep and improving latency due to WAL throttling  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: Syncrep and improving latency due to WAL throttling
List pgsql-hackers
Hi,

On 2023-01-26 14:40:56 +0100, Jakub Wartak wrote:
> In summary:  Attached is a slightly reworked version of this patch.
> 1. Moved logic outside XLogInsertRecord() under ProcessInterrupts()
> 2. Flushes up to the last page boundary, still uses SyncRepWaitForLSN()
> 3. Removed GUC for now (always on->256kB); potentially to be restored

Huh? Why did you remove the GUC? Clearly this isn't something we can default
to on.


> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
> index d85e313908..05d56d65f9 100644
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -2395,6 +2395,7 @@ CommitTransaction(void)
>  
>      XactTopFullTransactionId = InvalidFullTransactionId;
>      nParallelCurrentXids = 0;
> +    backendWalInserted = 0;
>  
>      /*
>       * done with commit processing, set current transaction state back to

I don't like the resets around like this. Why not just move it into
XLogFlush()?



> +/*
> + * Called from ProcessMessageInterrupts() to avoid waiting while being in critical section
> + */ 
> +void HandleXLogDelayPending()
> +{
> +    /* flush only up to the last fully filled page */
> +    XLogRecPtr     LastFullyWrittenXLogPage = XactLastRecEnd - (XactLastRecEnd % XLOG_BLCKSZ);
> +    XLogDelayPending = false;

Hm - wonder if it'd be better to determine the LSN to wait for in
XLogInsertRecord(). There'll be plenty cases where we'll insert multiple (but
bounded number of) WAL records before processing interrupts. No need to flush
more aggressively than necessary...


> +    //HOLD_INTERRUPTS();
> +
> +    /* XXX Debug for now */
> +    elog(WARNING, "throttling WAL down on this session (backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)", 
> +        backendWalInserted, 
> +        LSN_FORMAT_ARGS(XactLastRecEnd),
> +        LSN_FORMAT_ARGS(LastFullyWrittenXLogPage));
> +
> +    /* XXX: refactor SyncRepWaitForLSN() to have different waitevent than default WAIT_EVENT_SYNC_REP  */
> +    /* maybe new WAIT_EVENT_SYNC_REP_BIG or something like that */
> +    XLogFlush(LastFullyWrittenXLogPage);
> +    SyncRepWaitForLSN(LastFullyWrittenXLogPage, false);

SyncRepWaitForLSN() has this comment:
 * 'lsn' represents the LSN to wait for.  'commit' indicates whether this LSN
 * represents a commit record.  If it doesn't, then we wait only for the WAL
 * to be flushed if synchronous_commit is set to the higher level of
 * remote_apply, because only commit records provide apply feedback.


> +    elog(WARNING, "throttling WAL down on this session - end");
> +    backendWalInserted = 0;
> +
> +    //RESUME_INTERRUPTS();
> +}

I think we'd want a distinct wait event for this.



> diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> index 1b1d814254..8ed66b2eae 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -37,6 +37,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
>  volatile sig_atomic_t ProcSignalBarrierPending = false;
>  volatile sig_atomic_t LogMemoryContextPending = false;
>  volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
> +volatile sig_atomic_t XLogDelayPending = false;
>  volatile uint32 InterruptHoldoffCount = 0;
>  volatile uint32 QueryCancelHoldoffCount = 0;
>  volatile uint32 CritSectionCount = 0;

I don't think the new variable needs to be volatile, or even
sig_atomic_t. We'll just manipulate it from outside signal handlers.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Syncrep and improving latency due to WAL throttling
Next
From: Andres Freund
Date:
Subject: Re: Syncrep and improving latency due to WAL throttling