Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uDEMo33g2cRJ1RhN-=U8jP7Jkh+k4Y9sCiADGfQ6m_EyQ@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Mon, Dec 11, 2023 at 1:47 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > PFA v43, changes are:
> > >
> >
> > I wanted to discuss 0003 patch about cascading standby's. It is not
> > clear to me whether we want to allow physical standbys to further wait
> > for cascading standby to sync their slots. If we allow such a feature
> > one may expect even primary to wait for all the cascading standby's
> > because otherwise still logical subscriber can be ahead of one of the
> > cascading standby. I feel even if we want to allow such a behaviour we
> > can do it later once the main feature is committed. I think it would
> > be good to just allow logical walsenders on primary to wait for
> > physical standbys represented by GUC 'standby_slot_names'. If we agree
> > on that then it would be good to prohibit setting this GUC on standby
> > or at least it should be a no-op even if this GUC should be set on
> > physical standby.
> >
> > Thoughts?
>
> IMHO, why not keep the behavior consistent across primary and standby?
>  I mean if it doesn't require a lot of new code/design addition then
> it should be the user's responsibility.  I mean if the user has set
> 'standby_slot_names' on standby then let standby also wait for
> cascading standby to sync their slots?  Is there any issue with that
> behavior?
>

Without waiting for cascading standby on primary, it won't be helpful
to just wait on standby.

Currently logical walsenders on primary waits for physical standbys to
take changes before they update their own logical slots. But they wait
only for their immediate standbys and not for cascading standbys.
Although, on first standby, we do have logic where slot-sync workers
wait for cascading standbys before they update their own slots (synced
ones, see patch3). But this does not guarantee that logical
subscribers on primary will never be ahead of the cascading standbys.
Let us consider this timeline:

t1: logical walsender on primary waiting for standby1 (first standby).
t2: physical walsender on standby1 is stuck and thus there is delay in
sending these changes to standby2 (cascading standby).
t3: standby1 has taken changes and sends confirmation to primary.
t4: logical walsender on primary receives confirmation from standby1
and updates slot, logical subscribers of primary also receives the
changes.
t5: standby2 has not received changes yet as physical walsender on
standby1 is still stuck, slotsync worker still waiting for standby2
(cascading) before it updates its own slots (synced ones).
t6: standby2 is promoted to become primary.

Now we are in a state wherein primary, logical subscriber and first
standby has some changes but cascading standby does not. And logical
slots on primary were updated w/o confirming if cascading standby has
taken changes or not. This is a problem and we do not have a simple
solution for this yet.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Shubham Khanna
Date:
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby