RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB57160CC95DAD0E3B6D60DBAB94B7A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
> On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >> Yeah good point, agree to just error out in all the case then (if we
> >> discard the sync_ reserved wording proposal, which seems to be the
> >> case as probably not worth the extra work).
> >
> > Thanks for the discussion!
> >
> > Here is the V33 patch set which includes the following changes:
> 
> Thanks for working on it!
> 
> >
> > 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.
> 
> @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr
> RecPtr, bool randAccess,
>                       * this only after failure, so when you promote, we still
>                       * finish replaying as much as we can from archive and
>                       * pg_wal before failover.
> +                    *
> +                    * Drop the slots for which sync is initiated but not yet
> +                    * completed i.e. they are still waiting for the primary
> +                    * server to catch up.
>                       */
>                      if (StandbyMode && CheckForStandbyTrigger())
>                      {
>                          XLogShutdownWalRcv();
> +                       slotsync_drop_initiated_slots();
>                          return XLREAD_FAIL;
>                      }
> 
> I had a closer look and it seems this is not located at the right place.
> 
> Indeed, it's added here:
> 
> switch (currentSource)
> {
>     case XLOG_FROM_ARCHIVE:
>     case XLOG_FROM_PG_WAL:
> 
> While in our case we are in
> 
>     case XLOG_FROM_STREAM:
> 
> So I think we should move slotsync_drop_initiated_slots() in the
> XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
> (the TODO item number 2 you mentioned up-thread)

Thanks for the comment.

I feel the WaitForWALToBecomeAvailable may not be the best place to shutdown
slotsync worker and drop slots. There could be other reasons(other than
promotion) as mentioned in comments in case XLOG_FROM_STREAM to reach the code
there. I thought if the intention is to stop slotsync workers on promotion,
maybe FinishWalRecovery() is a better place to do it as it's indicating the end
of recovery and XLogShutdownWalRcv is also called in it.

And I feel we'd better drop the slots after shutting down the slotsync workers,
because otherwise the slotsync workers could create the dropped slot again in
rare cases.

Best Regards,
Hou zj



pgsql-hackers by date:

Previous
From: Jubilee Young
Date:
Subject: Re: Hide exposed impl detail of wchar.c
Next
From: Richard Guo
Date:
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500