Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1+EeUUT+gnzoHKWm7GqosA2ehT8QyoKtu1jiyGE=wUErw@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Tue, Dec 5, 2023 at 7:38 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 12/5/23 12:32 PM, Amit Kapila wrote:
> > On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> >>
> >> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> >> <bertranddrouvot.pg@gmail.com> wrote:
> >>>>
> >>>
> >>> Maybe another option could be to have the walreceiver a way to let the slot sync
> >>> worker knows that it (the walreceiver) was not able to start due to non existing
> >>> replication slot on the primary? (that way we'd avoid the slot sync worker having
> >>> to talk to the primary).
> >>
> >> Few points:
> >> 1) I think if we do it, we should do it in generic way i.e. slotsync
> >> worker should go to no-op if walreceiver is not able to start due to
> >> any reason and not only due to invalid primary_slot_name.
> >> 2) Secondly, slotsync worker needs to make sure it has synced the
> >> slots so far i.e. worker should not go to no-op immediately on seeing
> >> missing WalRcv process if there are pending slots to be synced.
> >>
> >
> > Won't it be better to just ping and check the validity of
> > 'primary_slot_name' at the start of slot-sync and if it is changed
> > anytime? I think it would be better to avoid adding dependency on
> > walreciever state as that sounds like needless complexity.
>
> I think the overall extra complexity is linked to the fact that we first
> want to ensure that the slots are in sync before shutting down the
> sync slot worker.
>
> I think than talking to the primary or relying on the walreceiver state
> is "just" what would trigger the decision to shutdown the sync slot worker.
>
> Relying on the walreceiver state looks better to me (as it avoids possibly
> useless round trips with the primary).
>

But the round trip will only be once in the beginning and if the user
changes the GUC primary-slot_name which shouldn't be that often.

> Also the walreceiver could be down for multiple reasons, and I think there
> is no point of having a sync slot worker running if the slots are in sync and
> there is no walreceiver running (even if primary_slot_name is a valid one).
>

I feel that is indirectly relying on the fact that the primary won't
advance logical slots unless physical standby has consumed data. Now,
it is possible that slot-sync worker lags behind and still needs to
sync more data for slots in which it makes sense for slot-sync worker
to be alive. I think we can try to avoid checking walreceiver status
till we can get more data to avoid the problem I mentioned but it
doesn't sound like a clean way to achieve our purpose.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Next
From: Michael Paquier
Date:
Subject: Re: RFI: Extending the TOAST Pointer