Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id 46070646-9e09-4566-8a62-ae31a12a510c@gmail.com
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
RE: Synchronizing slots from primary to standby
List pgsql-hackers
Hi,

On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
>> Yeah good point, agree to just error out in all the case then (if we discard the
>> sync_ reserved wording proposal, which seems to be the case as probably not
>> worth the extra work).
> 
> Thanks for the discussion!
> 
> Here is the V33 patch set which includes the following changes:

Thanks for working on it!

> 
> 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.

@@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                      * this only after failure, so when you promote, we still
                      * finish replaying as much as we can from archive and
                      * pg_wal before failover.
+                    *
+                    * Drop the slots for which sync is initiated but not yet
+                    * completed i.e. they are still waiting for the primary
+                    * server to catch up.
                      */
                     if (StandbyMode && CheckForStandbyTrigger())
                     {
                         XLogShutdownWalRcv();
+                       slotsync_drop_initiated_slots();
                         return XLREAD_FAIL;
                     }

I had a closer look and it seems this is not located at the right place.

Indeed, it's added here:

switch (currentSource)
{
    case XLOG_FROM_ARCHIVE:
    case XLOG_FROM_PG_WAL:

While in our case we are in

    case XLOG_FROM_STREAM:

So I think we should move slotsync_drop_initiated_slots() in the
XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
(the TODO item number 2 you mentioned up-thread)

BTW in order to prevent any corner case, would'nt also be better to

replace:

+   /*
+    * Do not allow consumption of a "synchronized" slot until the standby
+    * gets promoted.
+    */
+   if (RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE))

with something like:

if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) || slot->data.sync_state ==
SYNCSLOT_STATE_INITIATED)

to ensure slots in 'i' case can never be used?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Array initialisation notation in syscache.c
Next
From: Nathan Bossart
Date:
Subject: Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro