Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date
Msg-id CAE9k0PnE1jO9qnAewng3C+z6HtN9xhrqth+H3UNd79Jc4uvzUw@mail.gmail.com
Whole thread
In response to Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication  (shveta malik <shveta.malik@gmail.com>)
Responses RE: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
List pgsql-hackers
On Tue, Apr 7, 2026 at 5:18 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Apr 7, 2026 at 3:56 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Apr 7, 2026 at 11:20 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Apr 7, 2026 at 9:04 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > >
> > > > I see your point. I agree that using wal_receiver_status_interval for
> > > > this test may not be a reliable way. Can we attempt using
> > > > pg_wal_replay_pause() on standby and then checking
> > > > wait_event=WaitForStandbyConfirmation with backend_type=walsender on
> > > > primary? Or do you see any issues in this approach that I might be
> > > > overlooking?
> > > >
> > >
> > > Yes, I think we can make use of the WAL replay pause/resume mechanism.
> > > This seems like the right approach, as it gives us a more controlled
> > > and deterministic way to validate the lagging behavior.
> > >
> >
> > Looking at 049_wait_for_lsn.pl (the test case you referenced), it
> > explicitly stops the WAL receiver by setting primary_conninfo to an
> > empty string, rather than just pausing WAL replay.
>
> Oh, I missed it in that testcase.  Setting primary_conninfo to NULL
> essentially means not starting the walreceiver and thus making the
> standby slot as inactive, for which we already have a testcase.
>
> > Using
> > pg_wal_replay_pause() alone only halts replay; the WAL receiver
> > continues running, keeps receiving WAL, and sends feedback/status to
> > the primary. That feedback is sufficient to advance restart_lsn on the
> > standby’s slot, which would violate the restart_lsn < wait_for_lsn
> > condition inside StandbySlotsHaveCaughtup(), which is not what we
> > want.
>
> Yes, I see. IIUC, the same problem will be there if we use
> recovery_min_apply_delay i.e., WALs will be received, flushed and
> feedback will be sent to primary, only replay will be delayed. We can
> use 'synchronous_commit = remote_apply' along with
> 'recovery_min_apply_delay ' but that would mean delaying logical
> replication because transaction commit is blocking not because standby
> is actually lagging. It will not be a suitable test for
> 'synchronized_satndby_slots'.
>

Even with synchronous_commit = remote_apply and paused replay, standby
can still send replies to the primary updating the slot's restart_lsn.

--
With Regards,
Ashutosh Sharma.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded
Next
From: Nathan Bossart
Date:
Subject: Re: vectorized CRC on ARM64