Re: Syncrep and improving latency due to WAL throttling - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Syncrep and improving latency due to WAL throttling
Date
Msg-id 88357a73-fd43-3d3a-58ba-a9ab63cd9521@enterprisedb.com
Whole thread Raw
In response to Re: Syncrep and improving latency due to WAL throttling  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: Syncrep and improving latency due to WAL throttling
List pgsql-hackers
On 2/1/23 14:40, Jakub Wartak wrote:
> On Wed, Feb 1, 2023 at 2:14 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> 
>>> Maybe we should avoid calling fsyncs for WAL throttling? (by teaching
>>> HandleXLogDelayPending()->XLogFlush()->XLogWrite() to NOT to sync when
>>> we are flushing just because of WAL thortting ?) Would that still be
>>> safe?
>>
>> It's not clear to me how could this work and still be safe. I mean, we
>> *must* flush the local WAL first, otherwise the replica could get ahead
>> (if we send unflushed WAL to replica and then crash). Which would be
>> really bad, obviously.
> 
> Well it was just a thought: in this particular test - with no other
> concurrent activity happening - we are fsyncing() uncommitted
> Heap/INSERT data much earlier than the final Transaction/COMMIT WAL
> record comes into play.

Right. I see it as testing (more or less) a worst-case scenario,
measuring impact on commands generating a lot of WAL. I'm not sure the
slowdown comes from the extra fsyncs, thgouh - I'd bet it's more about
the extra waits for confirmations from the replica.

> I agree that some other concurrent backend's
> COMMIT could fsync it, but I was wondering if that's sensible
> optimization to perform (so that issue_fsync() would be called for
> only commit/rollback records). I can imagine a scenario with 10 such
> concurrent backends running - all of them with this $thread-GUC set -
> but that would cause 20k unnecessary fsyncs (?) -- (assuming single
> HDD with IOlat=20ms and standby capable of sync-ack < 0.1ms , that
> would be wasted close to 400s just due to local fsyncs?). I don't have
> a strong opinion or in-depth on this, but that smells like IO waste.
> 

Not sure what optimization you mean, but triggering the WAL flushes from
a separate process would be beneficial. But we already do that, more or
less - that's what WAL writer is about, right? Maybe it's not aggressive
enough or something, not sure.

But I think the backends still have to sleep at some point, so that they
don't queue too much unflushed WAL - that's kinda the whole point, no?
The issue is more about triggering the throttling too early, before we
hit the bandwidth limit. Which happens simply because we don't have a
very good way to decide whether the latency is growing, so the patch
just throttles everything.

Consider a replica on a network link with 10ms round trip. Then commit
latency can't really be better than 10ms, and throttling at that point
can't really improve anything, it just makes it slower. Instead, we
should measure the latency somehow, and only throttle when it increases.
And probably make it proportional to the delta (so the higher it's from
the "minimal" latency, the more we'd throttle).

I'd imagine we'd measure the latency (or the wait for sync replica) over
reasonably short time windows (1/10 of a second?), and using that to
drive the throttling. If the latency is below some acceptable value,
don't throttle at all. If it increases, start throttling.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: pg_dump versus hash partitioning
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_dump versus hash partitioning