Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: Syncrep and improving latency due to WAL throttling
Date
Msg-id CAKZiRmxk-e=RnueWqu52p55p1CzomDDtqDT=1pxewQgA0V60BA@mail.gmail.com
Whole thread Raw
In response to Re: Syncrep and improving latency due to WAL throttling  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Syncrep and improving latency due to WAL throttling
List pgsql-hackers
On Wed, Feb 1, 2023 at 2:14 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

> > Maybe we should avoid calling fsyncs for WAL throttling? (by teaching
> > HandleXLogDelayPending()->XLogFlush()->XLogWrite() to NOT to sync when
> > we are flushing just because of WAL thortting ?) Would that still be
> > safe?
>
> It's not clear to me how could this work and still be safe. I mean, we
> *must* flush the local WAL first, otherwise the replica could get ahead
> (if we send unflushed WAL to replica and then crash). Which would be
> really bad, obviously.

Well it was just a thought: in this particular test - with no other
concurrent activity happening - we are fsyncing() uncommitted
Heap/INSERT data much earlier than the final Transaction/COMMIT WAL
record comes into play. I agree that some other concurrent backend's
COMMIT could fsync it, but I was wondering if that's sensible
optimization to perform (so that issue_fsync() would be called for
only commit/rollback records). I can imagine a scenario with 10 such
concurrent backends running - all of them with this $thread-GUC set -
but that would cause 20k unnecessary fsyncs (?) -- (assuming single
HDD with IOlat=20ms and standby capable of sync-ack < 0.1ms , that
would be wasted close to 400s just due to local fsyncs?). I don't have
a strong opinion or in-depth on this, but that smells like IO waste.

-J.



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Support for dumping extended statistics
Next
From: David Geier
Date:
Subject: Performance issues with parallelism and LIMIT