Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers

From Soumyadeep Chakraborty
Subject Re: Syncrep and improving latency due to WAL throttling
Date
Msg-id CAE-ML+_Sf4oTYfm-XLo_hF114uEGA-qvXnoBbAF967Nu_JygYw@mail.gmail.com
Whole thread Raw
In response to Re: Syncrep and improving latency due to WAL throttling  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Syncrep and improving latency due to WAL throttling
List pgsql-hackers
Hey Tomas,

Shirisha had posted a recent re-attempt here [1] and then we were
graciously redirected here by Jakub.

We took a look at v5-0001-v4.patch and also a brief look at
v5-0002-rework.patch. We feel that it might be worth considering
throttling based on the remote standby to begin with for simplicity (i.e.
wal_flush_after_remote), and then we can add the other knobs incrementally?

Our comments on v5-0001-v4.patch are below:

> +
> + /*
> + * Decide if we need to throttle this backend, so that it does not write
> + * WAL too fast, causing lag against the sync standby (which in turn
> + * increases latency for standby confirmations). We may be holding locks
> + * and blocking interrupts here, so we only make the decision, but the
> + * wait (for sync standby confirmation) happens elsewhere.

Slightly reworded:

* wait (for sync standby confirmation) happens as part of the next
* CHECK_FOR_INTERRUPTS(). See HandleXLogDelayPending() for details on
* why the delay is deferred.

> + *
> + * The throttling is applied only to large transactions (producing more
> + * than wal_throttle_threshold kilobytes of WAL). Throttled backends
> + * can be identified by a new wait event SYNC_REP_THROTTLED.
> + *
> + * Small transactions (by amount of produced WAL) are still subject to
> + * the sync replication, so the same wait happens at commit time.
> + *
> + * XXX Not sure this is the right place for a comment explaining how the
> + * throttling works. This place is way too low level, and rather far from
> + * the place where the wait actually happens.

Perhaps the best course of action is to move the code and the comments
above into an inline function: SetAndCheckWALThrottle().

> +
> +/*
> + * HandleXLogDelayPending
> + * Throttle backends generating large amounts of WAL.
> + *
> + * The throttling is implemented by waiting for a sync replica confirmation for
> + * a convenient LSN position. In particular, we do not wait for the current LSN,
> + * which may be in a partially filled WAL page (and we don't want to write this
> + * one out - we'd have to write it out again, causing write amplification).
> + * Instead, we move back to the last fully WAL page.
> + *
> + * Called from ProcessMessageInterrupts() to avoid syncrep waits in XLogInsert(),
> + * which happens in critical section and with blocked interrupts (so it would be
> + * impossible to cancel the wait if it gets stuck). Also, there may be locks held
> + * and we don't want to hold them longer just because of the wait.
> + *
> + * XXX Andres suggested we actually go back a couple pages, to increase the
> + * probability the LSN was already flushed (obviously, this depends on how much
> + * lag we allow).
> + *
> + * XXX Not sure why we use XactLastThrottledRecEnd and not simply XLogRecEnd?
> + */
> +void
> +HandleXLogDelayPending()
> +{
> + XLogRecPtr lsn;
> +
> + /* calculate last fully filled page */
> + lsn = XactLastThrottledRecEnd - (XactLastThrottledRecEnd % XLOG_BLCKSZ);
> +
> + Assert(wal_throttle_threshold > 0);
> + Assert(backendWalInserted >= wal_throttle_threshold * 1024L);
> + Assert(XactLastThrottledRecEnd != InvalidXLogRecPtr);
> +
> + XLogFlush(lsn);
> + SyncRepWaitForLSN(lsn, false);
> +
> + XLogDelayPending = false;
> +}


(1) Can't we simply call SyncRepWaitForLSN(LogwrtResult.Flush, false); here?
LogwrtResult.Flush will guarantee that we are waiting on something that
has already been flushed or will be flushed soon. Then we wouldn't need
to maintain XactLastThrottledRecEnd, nor call XLogFlush() before calling
SyncRepWaitForLSN(). LogwrtResult can be slightly stale, but does that
really matter here?

(2) Also, to make things more understandable, instead of maintaining a
counter to track the number of WAL bytes written, maybe we should
maintain a LSN pointer called XLogDelayWindowStart. And then in here,
we can assert:

Assert(LogwrtResult.Flush - XLogDelayWindowStart >
wal_throttle_threshold * 1024);

Similarly, we can check the same conditional in SetAndCheckWALThrottle().

After the wait is done and we reset XLogDelayPending = false, we can
perhaps reset XLogDelayWindowStart = LogwrtResult.Flush.

The only downside probably is that if our standby is caught up enough, we may be
repeatedly and unnecessarily invoking a HandleXLogDelayPending(), where our
SyncRepWaitForLSN() would be called with an older LSN and it would be a no-op
early exit.


> + * XXX Should this be done even if XLogDelayPending is already set? Maybe
> + * that should only update XactLastThrottledRecEnd, withoug incrementing
> + * the pgWalUsage.wal_throttled counter?
> + */
> + backendWalInserted += rechdr->xl_tot_len;
> +
> + if ((synchronous_commit >= SYNCHRONOUS_COMMIT_REMOTE_WRITE) &&
> + (wal_throttle_threshold > 0) &&
> + (backendWalInserted >= wal_throttle_threshold * 1024L))
> + {
> + XactLastThrottledRecEnd = XactLastRecEnd;
> + InterruptPending = true;
> + XLogDelayPending = true;
> + pgWalUsage.wal_throttled++;
XLogDelayWindowStart = LogwrtResult.Flush;
> + }

Yeah we shouldn't do all of this if XLogDelayPending is already set.
We can add a quick
!XLogDelayPending leading conditional.

Also, [1] has a TAP test that may be useful.

Regards,
Soumyadeep & Shirisha
Broadcom

[1] https://www.postgresql.org/message-id/CAP3-t08umaBEUEppzBVY6%3D%3D3tbdLwG7b4wfrba73zfOAUrRsoQ%40mail.gmail.com



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Add contrib/pg_logicalsnapinspect
Next
From: Srirama Kucherlapati
Date:
Subject: RE: AIX support