Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1++T5K1SKE_=y8PPZa-VxkFrRPA8XjbttUCx9VpVXS-yw@mail.gmail.com
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Synchronizing slots from primary to standby
List pgsql-hackers
On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is the new version patch set(V29) which addressed Peter comments[1][2] and
> fixed one doc compile error.
>

Few comments:
==============
1.
+       <varlistentry id="sql-createsubscription-params-with-failover">
+        <term><literal>failover</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the replication slot assocaited with the
subscription
+          is enabled to be synced to the physical standbys so that logical
+          replication can be resumed from the new primary after failover.
+          The default is <literal>true</literal>.

Why do you think it is a good idea to keep the default value as true?
I think the user needs to enable standby for syncing slots which is
not a default feature, so by default, the failover property should
also be false. AFAICS, it is false for create_slot SQL API as per the
below change; so that way also keeping default true for a subscription
doesn't make sense.
@@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
     IN twophase boolean DEFAULT false,
+    IN failover boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)

BTW, the below change indicates that the code treats default as false;
so, it seems to be a documentation error.
@@ -157,6 +158,8 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  opts->runasowner = false;
  if (IsSet(supported_opts, SUBOPT_ORIGIN))
  opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+ if (IsSet(supported_opts, SUBOPT_FAILOVER))
+ opts->failover = false;

2.
-
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
  *

Spurious line removal.

3.
+ else if (opts.slot_name && failover_enabled)
+ {
+ walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
+ ereport(NOTICE,
+ (errmsg("altered replication slot \"%s\" on publisher",
+ opts.slot_name)));
+ }

I think we can add a comment to describe why it makes sense to enable
the failover property of the slot in this case. Can we change the
notice message to: "enabled failover for replication slot \"%s\" on
publisher"

4.
 libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
- bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
- XLogRecPtr *lsn)
+ bool temporary, bool two_phase, bool failover,
+ CRSSnapshotAction snapshot_action, XLogRecPtr *lsn)
 {
  PGresult   *res;
  StringInfoData cmd;
@@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
char *slotname,
  else
  appendStringInfoChar(&cmd, ' ');
  }
-
+ if (failover)
+ {
+ appendStringInfoString(&cmd, "FAILOVER");
+ if (use_new_options_syntax)
+ appendStringInfoString(&cmd, ", ");
+ else
+ appendStringInfoChar(&cmd, ' ');
+ }

I don't see a corresponding change in repl_gram.y. I think the
following part of the code needs to be changed:
/* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
create_slot_options

You also need to update the docs for the same. See [1].

5.
@@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
fcinfo, bool confirm, bool bin
  NameStr(MyReplicationSlot->data.plugin),
  format_procedure(fcinfo->flinfo->fn_oid))));
..
+ if (XLogRecPtrIsInvalid(upto_lsn))
+ wal_to_wait = end_of_wal;
+ else
+ wal_to_wait = Min(upto_lsn, end_of_wal);
+
+ /* Initialize standby_slot_names_list */
+ SlotSyncInitConfig();
+
+ /*
+ * Wait for specified streaming replication standby servers (if any)
+ * to confirm receipt of WAL upto wal_to_wait.
+ */
+ WalSndWaitForStandbyConfirmation(wal_to_wait);
+
+ /*
+ * The memory context used to allocate standby_slot_names_list will be
+ * freed at the end of this call. So free and nullify the list in
+ * order to avoid usage of freed list in the next call to this
+ * function.
+ */
+ SlotSyncFreeConfig();

What if there is an error in WalSndWaitForStandbyConfirmation() before
calling SlotSyncFreeConfig()? I think the problem you are trying to
avoid by freeing it here can occur. I think it is better to do this in
a logical decoding context and free the list along with it as we are
doing in commit c7256e6564(see PG15). Also, it is better to allocate
this list somewhere in WalSndWaitForStandbyConfirmation(), probably in
WalSndGetStandbySlots, that will make the code look neat and also
avoid allocating this list when failover is not enabled for the slot.

6.
+/* ALTER_REPLICATION_SLOT slot */
+alter_replication_slot:
+ K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'

I think you need to update the docs for this new command. See existing docs [1].

[1] - https://www.postgresql.org/docs/devel/protocol-replication.html

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Popcount optimization using AVX512
Next
From: Nisha Moond
Date:
Subject: Re: Intermittent failure with t/003_logical_slots.pl test on windows