Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAHut+PsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ@mail.gmail.com
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Synchronizing slots from primary to standby
RE: Synchronizing slots from primary to standby
List pgsql-hackers
Here are some review comments for v105-0001

==========
doc/src/sgml/config.sgml

1.
+       <para>
+        The standbys corresponding to the physical replication slots in
+        <varname>standby_slot_names</varname> must configure
+        <literal>sync_replication_slots = true</literal> so they can receive
+        logical failover slots changes from the primary.
+       </para>

/slots changes/slot changes/

======
doc/src/sgml/func.sgml

2.
+        The function may be waiting if the specified slot is a logical failover
+        slot and <link
linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
+        is configured.

I know this has been through multiple versions already, but this
latest wording "may be waiting..." doesn't seem very good to me.

How about one of these?

* The function may not be able to return immediately if the specified
slot is a logical failover slot and standby_slot_names is configured.

* The function return might be blocked if the specified slot is a
logical failover slot and standby_slot_names is configured.

* If the specified slot is a logical failover slot then the function
will block until all physical slots specified in standby_slot_names
have confirmed WAL receipt.

* If the specified slot is a logical failover slot then the function
will not return until all physical slots specified in
standby_slot_names have confirmed WAL receipt.

~~~

3.
+        slot may return to an earlier position. The function may be waiting if
+        the specified slot is a logical failover slot and
+        <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>


Same as previous review comment #2

======
src/backend/replication/slot.c

4. WaitForStandbyConfirmation

+ * Used by logical decoding SQL functions that acquired logical failover slot.

IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
calls here unconditionally (i.e. the SQL functions don't even check if
they are failover slots before calling this) so the comment seems
misleading/redundant.

======
src/backend/replication/walsender.c

5. NeedToWaitForWal

+ /*
+ * Check if the standby slots have caught up to the flushed position. It
+ * is good to wait up to the flushed position and then let the WalSender
+ * send the changes to logical subscribers one by one which are already
+ * covered by the flushed position without needing to wait on every change
+ * for standby confirmation.
+ */
+ if (NeedToWaitForStandbys(flushed_lsn, wait_event))
+ return true;
+
+ *wait_event = 0;
+ return false;
+}
+

5a.
The comment (or part of it?) seems misplaced because it is talking
WalSender sending changes, but that is not happening in this function.

Also, isn't what this is saying already described by the other comment
in the caller? e.g.:

+ /*
+ * Within the loop, we wait for the necessary WALs to be flushed to
+ * disk first, followed by waiting for standbys to catch up if there
+ * are enough WALs or upon receiving the shutdown signal. To avoid the
+ * scenario where standbys need to catch up to a newer WAL location in
+ * each iteration, we update our idea of the currently flushed
+ * position only if we are not waiting for standbys to catch up.
+ */

~

5b.
Most of the code is unnecessary. AFAICT all this is exactly same as just 1 line:

return NeedToWaitForStandbys(flushed_lsn, wait_event);

~~~

6. WalSndWaitForWal

+ /*
+ * Within the loop, we wait for the necessary WALs to be flushed to
+ * disk first, followed by waiting for standbys to catch up if there
+ * are enough WALs or upon receiving the shutdown signal. To avoid the
+ * scenario where standbys need to catch up to a newer WAL location in
+ * each iteration, we update our idea of the currently flushed
+ * position only if we are not waiting for standbys to catch up.
+ */

Regarding that 1st sentence: maybe this logic used to be done
explicitly "within the loop" but IIUC this logic is now hidden inside
NeedToWaitForWal() so the comment should mention that.

----------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: [PATCH] updates to docs about HOT updates for BRIN
Next
From: Michael Paquier
Date:
Subject: Re: Fix race condition in InvalidatePossiblyObsoleteSlot()