Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uCAzkua8KAQaLNnYKOJ56x3yJ9kRfDxL8Enp5Li8bzhdQ@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Wed, Dec 6, 2023 at 10:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 5, 2023 at 7:38 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On 12/5/23 12:32 PM, Amit Kapila wrote:
> > > On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >>
> > >> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> > >> <bertranddrouvot.pg@gmail.com> wrote:
> > >>>>
> > >>>
> > >>> Maybe another option could be to have the walreceiver a way to let the slot sync
> > >>> worker knows that it (the walreceiver) was not able to start due to non existing
> > >>> replication slot on the primary? (that way we'd avoid the slot sync worker having
> > >>> to talk to the primary).
> > >>
> > >> Few points:
> > >> 1) I think if we do it, we should do it in generic way i.e. slotsync
> > >> worker should go to no-op if walreceiver is not able to start due to
> > >> any reason and not only due to invalid primary_slot_name.
> > >> 2) Secondly, slotsync worker needs to make sure it has synced the
> > >> slots so far i.e. worker should not go to no-op immediately on seeing
> > >> missing WalRcv process if there are pending slots to be synced.
> > >>
> > >
> > > Won't it be better to just ping and check the validity of
> > > 'primary_slot_name' at the start of slot-sync and if it is changed
> > > anytime? I think it would be better to avoid adding dependency on
> > > walreciever state as that sounds like needless complexity.
> >
> > I think the overall extra complexity is linked to the fact that we first
> > want to ensure that the slots are in sync before shutting down the
> > sync slot worker.
> >
> > I think than talking to the primary or relying on the walreceiver state
> > is "just" what would trigger the decision to shutdown the sync slot worker.
> >
> > Relying on the walreceiver state looks better to me (as it avoids possibly
> > useless round trips with the primary).
> >
>
> But the round trip will only be once in the beginning and if the user
> changes the GUC primary-slot_name which shouldn't be that often.
>
> > Also the walreceiver could be down for multiple reasons, and I think there
> > is no point of having a sync slot worker running if the slots are in sync and
> > there is no walreceiver running (even if primary_slot_name is a valid one).
> >
>
> I feel that is indirectly relying on the fact that the primary won't
> advance logical slots unless physical standby has consumed data.

Yes, that is the basis of this discussion. But now on rethinking, if
the user has not set 'standby_slot_names' on primary at first pace,
then even if walreceiver on standby is down, slots on primary will
keep on advancing and thus we need to sync. We have no check currently
that mandates users to set standby_slot_names.

> Now,
> it is possible that slot-sync worker lags behind and still needs to
> sync more data for slots in which it makes sense for slot-sync worker
> to be alive. I think we can try to avoid checking walreceiver status
> till we can get more data to avoid the problem I mentioned but it
> doesn't sound like a clean way to achieve our purpose.
>



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Remove MSVC scripts from the tree
Next
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations