Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: Syncrep and improving latency due to WAL throttling
Date
Msg-id CAKZiRmyR_OBZfvaG03piRxxg7XDC+dmGx50P6Pmn-tMBLLdhVQ@mail.gmail.com
Whole thread Raw
In response to Re: Syncrep and improving latency due to WAL throttling  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Syncrep and improving latency due to WAL throttling
List pgsql-hackers
On Thu, Feb 2, 2023 at 11:03 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

> > I agree that some other concurrent backend's
> > COMMIT could fsync it, but I was wondering if that's sensible
> > optimization to perform (so that issue_fsync() would be called for
> > only commit/rollback records). I can imagine a scenario with 10 such
> > concurrent backends running - all of them with this $thread-GUC set -
> > but that would cause 20k unnecessary fsyncs (?) -- (assuming single
> > HDD with IOlat=20ms and standby capable of sync-ack < 0.1ms , that
> > would be wasted close to 400s just due to local fsyncs?). I don't have
> > a strong opinion or in-depth on this, but that smells like IO waste.
> >
>
> Not sure what optimization you mean,

Let me clarify, let's say something like below (on top of the v3) just
to save IOPS:

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2340,6 +2340,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli,
bool flexible)
                if (sync_method != SYNC_METHOD_OPEN &&
                        sync_method != SYNC_METHOD_OPEN_DSYNC)
                {
+                       bool openedLogFile = false;
                        if (openLogFile >= 0 &&
                                !XLByteInPrevSeg(LogwrtResult.Write,
openLogSegNo,

wal_segment_size))
@@ -2351,9 +2352,15 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID
tli, bool flexible)
                                openLogTLI = tli;
                                openLogFile = XLogFileOpen(openLogSegNo, tli);
                                ReserveExternalFD();
+                               openedLogFile = true;
                        }

-                       issue_xlog_fsync(openLogFile, openLogSegNo, tli);
+                       /* can we bypass fsyncing() XLOG from the backend if
+                        * we have been called without commit request?
+                        * usually the feature will be off here
(XLogDelayPending=false)
+                        */
+                       if(openedLogFile == true || XLogDelayPending == false)
+                               issue_xlog_fsync(openLogFile,
openLogSegNo, tli);
                }

+ maybe some additional logic to ensure that this micro-optimization
for saving IOPS would be not enabled if the backend is calling that
XLogFlush/Write() for actual COMMIT record

> But I think the backends still have to sleep at some point, so that they
> don't queue too much unflushed WAL - that's kinda the whole point, no?

Yes, but it can be flushed to standby, flushed locally but not fsynced
locally (?) - provided that it was not COMMIT - I'm just wondering
whether it makes sense (Question 1)

> The issue is more about triggering the throttling too early, before we
> hit the bandwidth limit. Which happens simply because we don't have a
> very good way to decide whether the latency is growing, so the patch
> just throttles everything.

Maximum TCP bandwidth limit seems to be fluctuating in the real world
I suppose, so it couldn't be a hard limit. On the other hand I can
imagine operators setting
"throttle-those-backends-if-global-WALlatencyORrate>XXX"
(administrative decision). That would be cool to have but yes it would
require WAL latency and rate measurement first (on its own that would
make a very nice addition to the pg_stat_replication). But one thing
to note would be that there could be many potential latencies (& WAL
throughput rates) to consider (e.g. quorum of 3 standby sync having
different latencies) - which one to choose?

(Question 2) I think we have reached simply a decision point on
whether the WIP/PoC is good enough as it is (like Andres wanted and
you +1 to this) or it should work as you propose or maybe keep it as
an idea for the future?

-J.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_dump versus hash partitioning
Next
From: Amit Kapila
Date:
Subject: Re: Deadlock between logrep apply worker and tablesync worker