RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB5716AAC5E4C73CC7DAD0855294222@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Tuesday, March 5, 2024 8:40 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for v105-0001
> 
> ==========
> doc/src/sgml/config.sgml
> 
> 1.
> +       <para>
> +        The standbys corresponding to the physical replication slots in
> +        <varname>standby_slot_names</varname> must configure
> +        <literal>sync_replication_slots = true</literal> so they can receive
> +        logical failover slots changes from the primary.
> +       </para>
> 
> /slots changes/slot changes/

Changed.

> 
> ======
> doc/src/sgml/func.sgml
> 
> 2.
> +        The function may be waiting if the specified slot is a logical failover
> +        slot and <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +        is configured.
> 
> I know this has been through multiple versions already, but this
> latest wording "may be waiting..." doesn't seem very good to me.
> 
> How about one of these?
> 
> * The function may not be able to return immediately if the specified
> slot is a logical failover slot and standby_slot_names is configured.
> 
> * The function return might be blocked if the specified slot is a
> logical failover slot and standby_slot_names is configured.
> 
> * If the specified slot is a logical failover slot then the function
> will block until all physical slots specified in standby_slot_names
> have confirmed WAL receipt.
> 
> * If the specified slot is a logical failover slot then the function
> will not return until all physical slots specified in
> standby_slot_names have confirmed WAL receipt.

I prefer the last one.


> 
> ~~~
> 
> 3.
> +        slot may return to an earlier position. The function may be waiting if
> +        the specified slot is a logical failover slot and
> +        <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> 
> 
> Same as previous review comment #2

Changed.

> 
> ======
> src/backend/replication/slot.c
> 
> 4. WaitForStandbyConfirmation
> 
> + * Used by logical decoding SQL functions that acquired logical failover slot.
> 
> IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
> calls here unconditionally (i.e. the SQL functions don't even check if
> they are failover slots before calling this) so the comment seems
> misleading/redundant.

I removed the "acquired logical failover slot.".

> 
> ======
> src/backend/replication/walsender.c
> 
> 5. NeedToWaitForWal
> 
> + /*
> + * Check if the standby slots have caught up to the flushed position. It
> + * is good to wait up to the flushed position and then let the WalSender
> + * send the changes to logical subscribers one by one which are already
> + * covered by the flushed position without needing to wait on every change
> + * for standby confirmation.
> + */
> + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> + return true;
> +
> + *wait_event = 0;
> + return false;
> +}
> +
> 
> 5a.
> The comment (or part of it?) seems misplaced because it is talking
> WalSender sending changes, but that is not happening in this function.
> 
> Also, isn't what this is saying already described by the other comment
> in the caller? e.g.:
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 

I moved these comments based on Shveta's suggestion.

> ~
> 
> 5b.
> Most of the code is unnecessary. AFAICT all this is exactly same as just 1 line:
> 
> return NeedToWaitForStandbys(flushed_lsn, wait_event);

Changed.

> 
> ~~~
> 
> 6. WalSndWaitForWal
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 
> Regarding that 1st sentence: maybe this logic used to be done
> explicitly "within the loop" but IIUC this logic is now hidden inside
> NeedToWaitForWal() so the comment should mention that.

Changed.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby