RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB57165AA821D881520E38FE2E94462@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Monday, February 5, 2024 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
lvh.no-ip.org>
> Subject: Re: Synchronizing slots from primary to standby
> 
> On Mon, Feb 5, 2024 at 8:26 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot
> instead of sync_replication_slots:
> > >
> > > The standbys corresponding to the physical replication slots in
> > > <varname>standby_slot_names</varname> must configure
> > > <literal>enable_syncslot = true</literal> so they can receive
> > > failover logical slots changes from the primary.
> >
> > Thanks Ajin for pointing this out. Here are v78 patches, corrected there.
> >
> > Other changes are:
> >
> > 1)  Rebased the patches as the v77-001 is now pushed.
> > 2)  Enabled executing pg_sync_replication_slots() on cascading-standby.
> > 3)  Rearranged the code around parameter validity checks. Changed
> > function names and changed the way how dbname is extracted as
> > suggested by Amit offlist.
> > 4)  Rearranged the code around check_primary_info(). Removed output
> args.
> > 5)  Few other trivial changes.
> >
> 
> Thank you for updating the patch! Here are some comments:
> 
> ---
> Since Two processes (e.g. the slotsync worker and
> pg_sync_replication_slots()) concurrently fetch and update the slot information,
> there is a race condition where slot's confirmed_flush_lsn goes backward. . We
> have the following check but it doesn't prevent the slot's confirmed_flush_lsn
> from moving backward if the restart_lsn does't change:
> 
>             /*
>              * Sanity check: As long as the invalidations are handled
>              * appropriately as above, this should never happen.
>              */
>             if (remote_slot->restart_lsn < slot->data.restart_lsn)
>                 elog(ERROR,
>                      "cannot synchronize local slot \"%s\" LSN(%X/%X)"
>                      " to remote slot's LSN(%X/%X) as synchronization"
>                      " would move it backwards", remote_slot->name,
>                      LSN_FORMAT_ARGS(slot->data.restart_lsn),
>                      LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> 

As discussed, I added a flag in shared memory to control the concurrent slot sync.

> ---
> +     It is recommended that subscriptions are first disabled before
> + promoting
> f+     the standby and are enabled back after altering the connection string.
> 
> I think it's better to describe the reason why it's recommended to disable
> subscriptions before the standby promotion.

Added.

> 
> ---
> +/* Slot sync worker objects */
> +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> +*PrimarySlotName;
> 
> These two variables are declared also in xlogrecovery.h. Is it intentional? If so, I
> think it's better to write comments.

Will address.

> 
> ---
> Global functions and variables used by the slotsync worker are declared in
> logicalworker.h and worker_internal.h. But is it really okay to make a
> dependency between the slotsync worker and logical replication workers? IIUC
> the slotsync worker is conceptually a separate feature from the logical
> replication. I think the slotsync worker can have its own header file.

Added.

> 
> ---
> +               SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> 
> and
> 
> +               SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || r.srrelid), false)
> 
> If we use CONCAT function, we can replace '||' with ','.
> 

Will address.

> ---
> +     Confirm that the standby server is not lagging behind the subscribers.
> +     This step can be skipped if
> +     <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +     has been correctly configured.
> 
> How can the user confirm if standby_slot_names is correctly configured?

Will address after concluding.

Thanks Shveta for helping addressing the comments.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby
Next
From: Mats Kindahl
Date:
Subject: glibc qsort() vulnerability