Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers
From | Soumyadeep Chakraborty |
---|---|
Subject | Re: Syncrep and improving latency due to WAL throttling |
Date | |
Msg-id | CAE-ML+_Sf4oTYfm-XLo_hF114uEGA-qvXnoBbAF967Nu_JygYw@mail.gmail.com Whole thread Raw |
In response to | Re: Syncrep and improving latency due to WAL throttling (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Syncrep and improving latency due to WAL throttling
|
List | pgsql-hackers |
Hey Tomas, Shirisha had posted a recent re-attempt here [1] and then we were graciously redirected here by Jakub. We took a look at v5-0001-v4.patch and also a brief look at v5-0002-rework.patch. We feel that it might be worth considering throttling based on the remote standby to begin with for simplicity (i.e. wal_flush_after_remote), and then we can add the other knobs incrementally? Our comments on v5-0001-v4.patch are below: > + > + /* > + * Decide if we need to throttle this backend, so that it does not write > + * WAL too fast, causing lag against the sync standby (which in turn > + * increases latency for standby confirmations). We may be holding locks > + * and blocking interrupts here, so we only make the decision, but the > + * wait (for sync standby confirmation) happens elsewhere. Slightly reworded: * wait (for sync standby confirmation) happens as part of the next * CHECK_FOR_INTERRUPTS(). See HandleXLogDelayPending() for details on * why the delay is deferred. > + * > + * The throttling is applied only to large transactions (producing more > + * than wal_throttle_threshold kilobytes of WAL). Throttled backends > + * can be identified by a new wait event SYNC_REP_THROTTLED. > + * > + * Small transactions (by amount of produced WAL) are still subject to > + * the sync replication, so the same wait happens at commit time. > + * > + * XXX Not sure this is the right place for a comment explaining how the > + * throttling works. This place is way too low level, and rather far from > + * the place where the wait actually happens. Perhaps the best course of action is to move the code and the comments above into an inline function: SetAndCheckWALThrottle(). > + > +/* > + * HandleXLogDelayPending > + * Throttle backends generating large amounts of WAL. > + * > + * The throttling is implemented by waiting for a sync replica confirmation for > + * a convenient LSN position. In particular, we do not wait for the current LSN, > + * which may be in a partially filled WAL page (and we don't want to write this > + * one out - we'd have to write it out again, causing write amplification). > + * Instead, we move back to the last fully WAL page. > + * > + * Called from ProcessMessageInterrupts() to avoid syncrep waits in XLogInsert(), > + * which happens in critical section and with blocked interrupts (so it would be > + * impossible to cancel the wait if it gets stuck). Also, there may be locks held > + * and we don't want to hold them longer just because of the wait. > + * > + * XXX Andres suggested we actually go back a couple pages, to increase the > + * probability the LSN was already flushed (obviously, this depends on how much > + * lag we allow). > + * > + * XXX Not sure why we use XactLastThrottledRecEnd and not simply XLogRecEnd? > + */ > +void > +HandleXLogDelayPending() > +{ > + XLogRecPtr lsn; > + > + /* calculate last fully filled page */ > + lsn = XactLastThrottledRecEnd - (XactLastThrottledRecEnd % XLOG_BLCKSZ); > + > + Assert(wal_throttle_threshold > 0); > + Assert(backendWalInserted >= wal_throttle_threshold * 1024L); > + Assert(XactLastThrottledRecEnd != InvalidXLogRecPtr); > + > + XLogFlush(lsn); > + SyncRepWaitForLSN(lsn, false); > + > + XLogDelayPending = false; > +} (1) Can't we simply call SyncRepWaitForLSN(LogwrtResult.Flush, false); here? LogwrtResult.Flush will guarantee that we are waiting on something that has already been flushed or will be flushed soon. Then we wouldn't need to maintain XactLastThrottledRecEnd, nor call XLogFlush() before calling SyncRepWaitForLSN(). LogwrtResult can be slightly stale, but does that really matter here? (2) Also, to make things more understandable, instead of maintaining a counter to track the number of WAL bytes written, maybe we should maintain a LSN pointer called XLogDelayWindowStart. And then in here, we can assert: Assert(LogwrtResult.Flush - XLogDelayWindowStart > wal_throttle_threshold * 1024); Similarly, we can check the same conditional in SetAndCheckWALThrottle(). After the wait is done and we reset XLogDelayPending = false, we can perhaps reset XLogDelayWindowStart = LogwrtResult.Flush. The only downside probably is that if our standby is caught up enough, we may be repeatedly and unnecessarily invoking a HandleXLogDelayPending(), where our SyncRepWaitForLSN() would be called with an older LSN and it would be a no-op early exit. > + * XXX Should this be done even if XLogDelayPending is already set? Maybe > + * that should only update XactLastThrottledRecEnd, withoug incrementing > + * the pgWalUsage.wal_throttled counter? > + */ > + backendWalInserted += rechdr->xl_tot_len; > + > + if ((synchronous_commit >= SYNCHRONOUS_COMMIT_REMOTE_WRITE) && > + (wal_throttle_threshold > 0) && > + (backendWalInserted >= wal_throttle_threshold * 1024L)) > + { > + XactLastThrottledRecEnd = XactLastRecEnd; > + InterruptPending = true; > + XLogDelayPending = true; > + pgWalUsage.wal_throttled++; XLogDelayWindowStart = LogwrtResult.Flush; > + } Yeah we shouldn't do all of this if XLogDelayPending is already set. We can add a quick !XLogDelayPending leading conditional. Also, [1] has a TAP test that may be useful. Regards, Soumyadeep & Shirisha Broadcom [1] https://www.postgresql.org/message-id/CAP3-t08umaBEUEppzBVY6%3D%3D3tbdLwG7b4wfrba73zfOAUrRsoQ%40mail.gmail.com
pgsql-hackers by date: