Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1LHb9Bb1O0GFft4H8ddtqZeLPCG2hKuSXYvj=du9MjsLw@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Tue, Nov 7, 2023 at 7:58 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 11/7/23 11:55 AM, Amit Kapila wrote:
> >>>
> >>> This is not full proof solution but optimization over first one. Now
> >>> in any sync-cycle, we take 2 attempts for slots-creation (if any slots
> >>> are available to be created). In first attempt, we do not wait
> >>> indefinitely on inactive slots, we wait only for a fixed amount of
> >>> time and if remote-slot is still behind, then we add that to the
> >>> pending list and move to the next slot. Once we are done with first
> >>> attempt, in second attempt, we go for the pending ones and now we wait
> >>> on each of them until the primary catches up.
> >>
> >> Aren't we "just" postponing the "issue"? I mean if there is really no activity
> >> on, say, the first created slot, then once we move to the second attempt then any newly
> >> created slot from that time would wait to be synced forever, no?
> >>
> >
> > We have to wait at some point in time for such inactive slots and the
> > same is true even for manually created slots on standby. Do you have
> > any better ideas to deal with it?
> >
>
> What about:
>
> - get rid of the second attempt and the pending_slot_list
> - keep the wait_count and PrimaryCatchupWaitAttempt logic
>
> so basically, get rid of:
>
>     /*
>      * Now sync the pending slots which were failed to be created in first
>      * attempt.
>      */
>     foreach(cell, pending_slot_list)
>     {
>         RemoteSlot *remote_slot = (RemoteSlot *) lfirst(cell);
>
>         /* Wait until the primary server catches up */
>         PrimaryCatchupWaitAttempt = 0;
>
>         synchronize_one_slot(wrconn, remote_slot, NULL);
>     }
>
> and the pending_slot_list list.
>
> That way, for each slot that have not been created and synced yet:
>
> - it will be created on the standby
> - we will wait up to PrimaryCatchupWaitAttempt attempts
> - the slot will be synced or removed on/from the standby
>
> That way an inactive slot on the primary would not "block"
> any other slots on the standby.
>
> By "created" here I mean calling ReplicationSlotCreate() (not to be confused
> with emitting "ereport(LOG, errmsg("created slot \"%s\" locally", remote_slot->name)); "
> which is confusing as mentioned up-thread).
>
> The problem I can see with this proposal is that the "sync" window waiting
> for slot activity on the primary is "only" during the PrimaryCatchupWaitAttempt
> attempts (as the slot will be dropped/recreated).
>
> If we think this window is too short we could:
>
> - increase it
> or
> - don't drop the slot once created (even if there is no activity
> on the primary during PrimaryCatchupWaitAttempt attempts) so that
> the next loop of attempts will compare with "older" LSN/xmin (as compare to
> dropping and re-creating the slot). That way the window would be since the
> initial slot creation.
>

Yeah, this sounds reasonable but we can't mark such slots to be
synced/available for use after failover. I think if we want to follow
this approach then we need to also monitor these slots for any change
in the consecutive cycles and if we are able to sync them then
accordingly we enable them to use after failover.

Another somewhat related point is that right now, we just wait for the
change on the first slot (the patch refers to it as the monitoring
slot) for computing nap_time before which we will recheck all the
slots. I think we can improve that as well such that even if any
slot's information is changed, we don't consider changing naptime.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Next
From: Amit Kapila
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node