Re: Allow async standbys wait for sync replication - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Allow async standbys wait for sync replication
Date
Msg-id 20220309020123.sneaoijlg3rszvst@alap3.anarazel.de
Whole thread Raw
In response to Re: Allow async standbys wait for sync replication  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Allow async standbys wait for sync replication  (Nathan Bossart <nathandbossart@gmail.com>)
Re: Allow async standbys wait for sync replication  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

On 2022-03-06 12:27:52 +0530, Bharath Rupireddy wrote:
> On Sun, Mar 6, 2022 at 1:57 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2022-03-05 14:14:54 +0530, Bharath Rupireddy wrote:
> > > I understand. Even if we use the SyncRepWaitForLSN approach, the async
> > > walsenders will have to do nothing in WalSndLoop() until the sync
> > > walsender wakes them up via SyncRepWakeQueue.
> >
> > I still think we should flat out reject this approach. The proper way to
> > implement this feature is to change the protocol so that WAL can be sent to
> > replicas with an additional LSN informing them up to where WAL can be
> > flushed. That way WAL is already sent when the sync replicas have acknowledged
> > receipt and just an updated "flush/apply up to here" LSN has to be sent.
> 
> I was having this thought back of my mind. Please help me understand these:
> 1) How will the async standbys ignore the WAL received but
> not-yet-flushed by them in case the sync standbys don't acknowledge
> flush LSN back to the primary for whatever reasons?

What do you mean with "ignore"? When replaying?

I think this'd require adding a new pg_control field saying up to which LSN
WAL is "valid". If that field is set, replay would only replay up to that LSN
unless some explicit operation is taken to replay further (e.g. for data
recovery).


> 2) When we say the async standbys will receive the WAL, will they just
> keep the received WAL in the shared memory but not apply or will they
> just write but not apply the WAL and flush the WAL to the pg_wal
> directory on the disk or will they write to some other temp wal
> directory until they receive go-ahead LSN from the primary?

I was thinking that for now it'd go to disk, but eventually would first go to
wal_buffers and only to disk if wal_buffers needs to be flushed out (and only
in that case the pg_control field would need to be set).


> 3) Won't the network transfer cost be wasted in case the sync standbys
> don't acknowledge flush LSN back to the primary for whatever reasons?

That should be *extremely* rare, and in that case a bit of wasted traffic
isn't going to matter.


> The proposed idea in this thread (async standbys waiting for flush LSN
> from sync standbys before sending the WAL), although it makes async
> standby slower in receiving the WAL, it doesn't have the above
> problems and is simpler to implement IMO. Since this feature is going
> to be optional with a GUC, users can enable it based on the needs.

To me it's architecturally the completely wrong direction. We should move in
the *other* direction, i.e. allow WAL to be sent to standbys before the
primary has finished flushing it locally. Which requires similar
infrastructure to what we're discussing here.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Next
From: "David G. Johnston"
Date:
Subject: Re: Naming of the different stats systems / "stats collector"