Home > mailing lists
Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	shveta malik
Subject	Re: Synchronizing slots from primary to standby
Date	December 15, 2023 05:39:12
Msg-id	CAJpy0uCKGo3rjSuM42i2UL5M7oeLTxV3Sht+=O9f1yu=fWPPjg@mail.gmail.com Whole thread
In response to	Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>)
List	pgsql-hackers
Tree view
On Wed, Dec 13, 2023 at 3:53 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shveta, here are some review comments for v45-0002.
>

Thanks for the feedback. Addressed these in v48. Please find my
comments on some.

> ======
> doc/src/sgml/bgworker.sgml
>
> 1.
> +   <variablelist>
> +    <varlistentry>
> +     <term><literal>BgWorkerStart_PostmasterStart</literal></term>
> +     <listitem>
> +      <para>
> +       <indexterm><primary>BgWorkerStart_PostmasterStart</primary></indexterm>
> +       Start as soon as postgres itself has finished its own initialization;
> +       processes requesting this are not eligible for database connections.
> +      </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry>
> +     <term><literal>BgWorkerStart_ConsistentState</literal></term>
> +     <listitem>
> +      <para>
> +       <indexterm><primary>BgWorkerStart_ConsistentState</primary></indexterm>
> +       Start as soon as a consistent state has been reached in a hot-standby,
> +       allowing processes to connect to databases and run read-only queries.
> +      </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry>
> +     <term><literal>BgWorkerStart_RecoveryFinished</literal></term>
> +     <listitem>
> +      <para>
> +       <indexterm><primary>BgWorkerStart_RecoveryFinished</primary></indexterm>
> +       Start as soon as the system has entered normal read-write state. Note
> +       that the <literal>BgWorkerStart_ConsistentState</literal> and
> +      <literal>BgWorkerStart_RecoveryFinished</literal> are equivalent
> +       in a server that's not a hot standby.
> +      </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry>
> +     <term><literal>BgWorkerStart_ConsistentState_HotStandby</literal></term>
> +     <listitem>
> +      <para>
> +       <indexterm><primary>BgWorkerStart_ConsistentState_HotStandby</primary></indexterm>
> +       Same meaning as <literal>BgWorkerStart_ConsistentState</literal> but
> +       it is more strict in terms of the server i.e. start the worker only
> +       if it is hot-standby.
> +      </para>
> +     </listitem>
> +    </varlistentry>
> +   </variablelist>
>
> Maybe reorder these slightly, because I felt it is better if the
> BgWorkerStart_ConsistentState_HotStandby comes next after
> BgWorkerStart_ConsistentState, which it refers to
>
> For example::
> 1st.BgWorkerStart_PostmasterStart
> 2nd.BgWorkerStart_ConsistentState
> 3rd.BgWorkerStart_ConsistentState_HotStandby
> 4th.BgWorkerStart_RecoveryFinished
>
> ======
> doc/src/sgml/config.sgml
>
> 2.
> <varname>enable_syncslot</varname> = true
>
> Not sure, but I thought the "= true" part should be formatted too.
>
> SUGGESTION
> <literal>enable_syncslot = true</literal>
>
> ======
> doc/src/sgml/logicaldecoding.sgml
>
> 3.
> +    <para>
> +     A logical replication slot on the primary can be synchronized to the hot
> +     standby by enabling the failover option during slot creation and setting
> +     <varname>enable_syncslot</varname> on the standby. For the synchronization
> +     to work, it is mandatory to have a physical replication slot between the
> +     primary and the standby. It's highly recommended that the said physical
> +     replication slot is listed in <varname>standby_slot_names</varname> on
> +     the primary to prevent the subscriber from consuming changes faster than
> +     the hot standby. Additionally, <varname>hot_standby_feedback</varname>
> +     must be enabled on the standby for the slots synchronization to work.
> +    </para>
>
> I felt those parts that describe the mandatory GUCs should be kept together.
>
> SUGGESTION
> For the synchronization to work, it is mandatory to have a physical
> replication slot between the primary and the standby, and
> <varname>hot_standby_feedback</varname> must be enabled on the
> standby.
>
> It's also highly recommended that the said physical replication slot
> is named in <varname>standby_slot_names</varname> list on the primary,
> to prevent the subscriber from consuming changes faster than the hot
> standby.
>
> ~~~
>
> 4. (Chapter 49)
>
> By enabling synchronization of slots, logical replication can be
> resumed after failover depending upon the
> pg_replication_slots.sync_state for the synchronized slots on the
> standby at the time of failover. Only slots that were in ready
> sync_state ('r') on the standby before failover can be used for
> logical replication after failover. However, the slots which were in
> initiated sync_state ('i') and not sync-ready ('r') at the time of
> failover will be dropped and logical replication for such slots can
> not be resumed after failover. This applies to the case where a
> logical subscription is disabled before failover and is enabled after
> failover. If the synchronized slot due to disabled subscription could
> not be made sync-ready ('r') on standby, then the subscription can not
> be resumed after failover even when enabled. If the primary is idle,
> then the synchronized slots on the standby may take a noticeable time
> to reach the ready ('r') sync_state. This can be sped up by calling
> the pg_log_standby_snapshot function on the primary.
>
> ~
>
> Somehow, I still felt all that was too wordy/repetitive. Below is my
> attempt to make it more concise. Thoughts?
>
> SUGGESTION
> The ability to resume logical replication after failover depends upon
> the pg_replication_slots.sync_state value for the synchronized slots
> on the standby at the time of failover. Only slots that have attained
> a "ready" sync_state ('r') on the standby before failover can be used
> for logical replication after failover. Slots that have not yet
> reached 'r' state (they are still 'i') will be dropped, therefore
> logical replication for those slots cannot be resumed. For example, if
> the synchronized slot could not become sync-ready on standby due to a
> disabled subscription, then the subscription cannot be resumed after
> failover even when it is enabled.
>
> If the primary is idle, the synchronized slots on the standby may take
> a noticeable time to reach the ready ('r') sync_state. This can be
> sped up by calling the pg_log_standby_snapshot function on the
> primary.
>
> ======
> doc/src/sgml/system-views.sgml
>
> 5.
> +      <para>
> +      Defines slot synchronization state. This is meaningful on the physical
> +      standby which has configured <varname>enable_syncslot</varname> = true
> +      </para>
>
> As mentioned in the previous review comment ([1]#10) I thought it
> might be good to include a hyperlink cross-reference to the
> 'enable_syncslot' GUC.
>
> ~~~
>
> 6.
> +      <para>
> +      The hot standby can have any of these sync_state for the slots but on a
> +      hot standby, the slots with state 'r' and 'i' can neither be used for
> +      logical decoding nor dropped by the user. The primary server will have
> +      sync_state as 'n' for all the slots. But if the standby is promoted to
> +      become the new primary server, sync_state can be seen 'r' as well. On
> +      this new primary server, slots with sync_state as 'r' and 'n' will
> +      behave the same.
> +      </para></entry>
>
> 6a.
> /these sync_state for the slots/these sync_state values for the slots/
>
> ~
>
> 6b
> Hm. I still felt (same as previous review [1]#12b) that there seems
> too much information here.
>
> IIUC the sync_state is only meaningful on the standby. Sure, it might
> have some values line 'n' or 'r' on the primary also, but those either
> mean nothing ('n') or are leftover states from a previous failover
> from a standby ('r'), which also means nothing. So can't we just say
> it more succinctly like that?
>
> SUGGESTION
> The sync_state has no meaning on the primary server; the primary
> sync_state value is default 'n' for all slots but may (if leftover
> from a promoted standby)  also be 'r'.
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 7.
>  static void
>  libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname,
> - bool failover)
> + bool failover)
>
> Still seems to be tampering with indentation that should only be in patch 0001.
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 8. wait_for_primary_slot_catchup
>
> The meaning of the boolean return of this function is still not
> described by the function comment.
>
> ~~~
>
> 9.
> + * If passed, *wait_attempts_exceeded will be set to true only if this
> + * function exits after exhausting its wait attempts. It will be false
> + * in all the other cases like failure, remote-slot invalidation, primary
> + * could catch up.
>
> The above already says when a return false happens, so it seems
> overkill to give more information.
>
> SUGGESTION
> If passed, *wait_attempts_exceeded will be set to true only if this
> function exits due to exhausting its wait attempts. It will be false
> in all the other cases.
>
> ~~~
>
> 10.
>
> +static bool
> +wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> +   bool *wait_attempts_exceeded)
> +{
> +#define WAIT_OUTPUT_COLUMN_COUNT 4
> +#define WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS 5
> +
>
> 10a
> Maybe the long constant name is too long. How about
> WAIT_PRIMARY_CATCHUP_ATTEMPTS?
>
> ~~~
>
> 10b.
> IMO it is better to Assert the input value of this kind of side-effect
> return parameter, to give a better understanding and to prevent future
> accidents.
>
> SUGGESTION
> Assert(wait_attempts_exceeded == NULL |} *wait_attempts_exceeded == false);
>
> ~~~
>
> 11. synchronize_one_slot
>
> + ReplicationSlot *s;
> + ReplicationSlot *slot;
> + char sync_state = '\0';
>
> 11a.
> I don't think you need both 's' and 'slot' ReplicationSlot -- it looks
> a bit odd. Can't you just reuse the one 'slot' variable?
>
> ~
>
> 11b.
> Also, maybe those assignment like
> + slot = MyReplicationSlot;
>
> can have an explanatory comment like:
> /* For convenience, we assign MyReplicationSlot to a shorter variable name. */
>

I have changed it to slightly simpler one, if that is okay?

> ~~~
>
> 12.
> +static void
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
> +{
> + ReplicationSlot *s;
> + ReplicationSlot *slot;
> + char sync_state = '\0';
>
> In my previous review [1]#33a I thought it was strange to assign the
> sync_state (which is essentially an enum) to some meaningless value,
> so I suggested it should be set to SYNCSLOT_STATE_NONE in the
> declaration. The reply [2] was "No, that will change the flow. It
> should stay uninitialized if the slot is not found."
>
> But I am not convinced there is any flow problem. Also,
> SYNCSLOT_STATE_NONE seems the naturally correct default for something
> with no state. It cannot be found and be SYNCSLOT_STATE_NONE at the
> same time (that is reported as an ERROR "skipping sync of slot") so I
> see no problem.
>
> The CURRENT code is like this:
>
> /* Slot created by the slot sync worker exists, sync it */
> if (sync_state)
> {
>   Assert(sync_state == SYNCSLOT_STATE_READY || sync_state ==
> SYNCSLOT_STATE_INITIATED);
>   ...
> }
> /* Otherwise create the slot first. */
> else
> {
>   ...
> }
>
> AFAICT that could easily be changed to like below, with no change to
> the logic, and it avoids setting strange values.
>
> SUGGESTION.
>
> if (sync_state == SYNCSLOT_STATE_NONE)
> {
>   /* Slot not found. Create it. */
>   ..
> }
> else
> {
>   Assert(sync_state == SYNCSLOT_STATE_READY || sync_state ==
> SYNCSLOT_STATE_INITIATED);
>   ...
> }
>

I have restructured the entire code here and thus initialization of
sync_state is no longer needed. Please review now and let me know.

> ~~~
>
> 13. synchronize_one_slot
>
> +static void
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
>
> This *slot_updated parameter looks dubious. It is used in a loop from
> the caller to mean that ANY slot was updated -- e.g. maybe it is true
> or false on entry to this function.
>
> But, Instead of having some dependency between this function and the
> caller, IMO it makes more sense if we would make this just a boolean
> function in the first place (e.g. was updated? T/F)
>
> Then the caller can also be written more easily like:
>
> some_slot_updated |= synchronize_one_slot(wrconn, remote_slot);
>
> ~~~
>
> 14.
> + /* Search for the named slot */
> + if ((s = SearchNamedReplicationSlot(remote_slot->name, true)))
> + {
> + SpinLockAcquire(&s->mutex);
> + sync_state = s->data.sync_state;
> + SpinLockRelease(&s->mutex);
> +
> + /* User created slot with the same name exists, raise ERROR. */
> + if (sync_state == SYNCSLOT_STATE_NONE)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("skipping sync of slot \"%s\" as it is a user created"
> + " slot", remote_slot->name),
> + errdetail("This slot has failover enabled on the primary and"
> +    " thus is sync candidate but user created slot with"
> +    " the same name already exists on the standby")));
> + }
> + }
>
>
> Extra curly brackets around the ereport are not needed.
>
> ~~~
>
> 15.
> + /*
> + * Sanity check: With hot_standby_feedback enabled and
> + * invalidations handled apropriately as above, this should never
> + * happen.
> + */
> + if (remote_slot->restart_lsn < slot->data.restart_lsn)
> + {
> + elog(ERROR,
> + "not synchronizing local slot \"%s\" LSN(%X/%X)"
> + " to remote slot's LSN(%X/%X) as synchronization "
> + " would move it backwards", remote_slot->name,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> + }
>
> 15a.
> /apropriately/appropriately/
>
> ~
>
> 15b.
> Extra curly brackets around the elog are not needed.
>
> ~~~
>
> 16. synchronize_slots
>
> +static bool
> +synchronize_slots(WalReceiverConn *wrconn)
> +{
> +#define SLOTSYNC_COLUMN_COUNT 9
> + Oid slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID,
> + LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, INT2OID};
> +
> + WalRcvExecResult *res;
> + TupleTableSlot *tupslot;
> + StringInfoData s;
> + List    *remote_slot_list = NIL;
> + MemoryContext oldctx = CurrentMemoryContext;
> + ListCell   *lc;
> + bool slot_updated = false;
>
> Suggest renaming 'slot_updated' to 'some_slot_updated' or
> 'update_occurred' etc because the current name makes it look like it
> applies to a single slot, but it doesn't.
>
> ~~~
>
> 17.
> + SpinLockAcquire(&WalRcv->mutex);
> + if (!WalRcv ||
> + (WalRcv->slotname[0] == '\0') ||
> + XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
> + {
> + SpinLockRelease(&WalRcv->mutex);
> + return slot_updated;
> + }
> + SpinLockRelease(&WalRcv->mutex);
>
> IMO "return false;" here is more clear than saying "return slot_updated;"
>
> ~~~
>
> 18.
> + appendStringInfo(&s,
> + "SELECT slot_name, plugin, confirmed_flush_lsn,"
> + " restart_lsn, catalog_xmin, two_phase, failover,"
> + " database, pg_get_slot_invalidation_cause(slot_name)"
> + " FROM pg_catalog.pg_replication_slots"
> + " WHERE failover and sync_state != 'i'");
>
> 18a.
> /and/AND/
>
> ~
>
> 18b.
> In the reply post (see [2]#32) Shveta said "I could not find quote_*
> function for a character just like we have 'quote_literal_cstr' for
> string". If you still want to use constant substitution instead of
> just hardwired 'i' then why do even you need a quote_* function? I
> thought the appendStringInfo uses a printf style format-string
> internally, so I assumed it is possible to substitute the state char
> directly using '%c'.
>

Since we have removed cascading standby support, this condition
(sync_state != 'i') is no longer needed in the query.

> ~~~
>
> 19.
> +
> +
> +
> + /* We are done, free remote_slot_list elements */
> + list_free_deep(remote_slot_list);
> +
> + walrcv_clear_result(res);
> +
> + return slot_updated;
> +}
>
> Excessive blank lines.
>
> ~~~
>
> 20. validate_primary_slot
>
> + appendStringInfo(&cmd,
> + "SELECT count(*) = 1 from pg_replication_slots "
> + "WHERE slot_type='physical' and slot_name=%s",
> + quote_literal_cstr(PrimarySlotName));
>
>
> /and/AND/
>
> ~~~
>
> 21.
> + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
> + tuple_ok = tuplestore_gettupleslot(res->tuplestore, true, false, slot);
> + Assert(tuple_ok); /* It must return one tuple */
>
> IMO it's better to use all the var names the same across all
> functions? So call this 'tupslot' like the other
> MakeSingleTupleTableSlot result.
>
> ~~~
>
> 22. validate_slotsync_parameters
>
> +/*
> + * Checks if GUCs are set appropriately before starting slot sync worker
> + *
> + * The slot sync worker can not start if 'enable_syncslot' is off and
> + * since 'enable_syncslot' is ON, check that the other GUC settings
> + * (primary_slot_name, hot_standby_feedback, wal_level, primary_conninfo)
> + * are compatible with slot synchronization. If not, raise ERROR.
> + */
> +static void
> +validate_slotsync_parameters(char **dbname)
> +{
>
> 22a.
> The comment is quite verbose. IMO the 2nd para seems just unnecessary
> detail of the 1st para.
>
> SUGGESTION
> Check that all necessary GUCs for slot synchronization are set
> appropriately. If not, raise an ERROR.
>
> ~~~
>
> 22b.
> IMO (and given what was said in the comment about enable_syncslot must
> be on)the first statement of this function should be:
>
> /* Sanity check. */
> Assert(enable_syncslot);
>
> ~~~
>
> 23. slotsync_reread_config
>
> + old_dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
> + Assert(old_dbname);
>
> (This is same comment as old review [1]#61)
>
> Hmm. I still don't see why this extraction of the dbname cannot be
> deferred until later when you know the PrimaryConnInfo has changed,
> otherwise, it might be redundant to do this. Shveta replied [2] that
> "Once PrimaryConnInfo is changed, we can not get old-dbname.", but I'm
> not so sure. Isn't this walrcv_get_dbname_from_conninfo just doing a
> string search -- Why can't you defer this until you know
> conninfoChanged is true, and then to get the old_dbname, you can just
> pass the old_primary_conninfo. E.g. call like
> walrcv_get_dbname_from_conninfo(old_primary_conninfo); Maybe I am
> mistaken.
>

Sorry missed your point earlier that we can use old_primary_conninfo
to extract dbname later.
I have removed this re-validation now as we will restart the worker in
case of a parameter change similar to the case of logical apply
worker. So these changes are no longer needed.

> ~~
>
> 24.
> + /*
> + * Since we have initialized this worker with the old dbname, thus
> + * exit if dbname changed. Let it get restarted and connect to the new
> + * dbname specified.
> + */
> + if (conninfoChanged && strcmp(old_dbname, new_dbname) != 0)
> + ereport(ERROR,
> + errmsg("exiting slot sync worker as dbname in "
> +    "primary_conninfo changed"));
>
> IIUC when the tablesync has to restart, it emits a LOG message before
> it exits; but it's not an ERROR. So, shouldn't this be similar -- IMO
> it is not an "error" for the user to wish to change the dbname. Maybe
> this should be LOG followed by an explicit exit. If you agree, then it
> might be better to encapsulate such logic in some little function:
>
> // pseudo-code
> void slotsync_worker_restart(const char *msg)
> {
> ereport(LOG, msg...
> exit(0);
> }
>

we can not do proc_exit(0), as then postmaster will not restart it on
clean-exit. I agree with your logic, but will have another argument in
this function to accept 'exit code' from the caller.

> ~~~
>
> 25. ReplSlotSyncWorkerMain
>
> + for (;;)
> + {
> + int rc;
> + long naptime = WORKER_DEFAULT_NAPTIME_MS;
> + TimestampTz now;
> + bool slot_updated;
> +
> + ProcessSlotSyncInterrupts(wrconn);
> +
> + slot_updated = synchronize_slots(wrconn);
>
> Here I think the 'slot_updated' should be renamed to the same name as
> in #16 above (e.g. 'some_slot_updated' or 'any_slot_updated' or
> 'update_occurred' etc).
>
> ~~~
>
> 26. SlotSyncWorkerRegister
>
> + if (!enable_syncslot)
> + {
> + ereport(LOG,
> + errmsg("skipping slots synchronization because enable_syncslot is "
> +    "disabled."));
> + return;
> + }
>
> Instead of saying "because..." in the error message maybe keep the
> message more terse and describe the "because" part in the errdetail
>
> SUGGESTION
> errmsg("skipping slot synchronization")
> errdetail("enable_syncslot is disabled.")
>
>
> ======
> src/backend/replication/slot.c
>
> 27.
> + * sync_state: Defines slot synchronization state. For user created slots, it
> + * is SYNCSLOT_STATE_NONE and for the slots being synchronized on the physical
> + * standby, it is either SYNCSLOT_STATE_INITIATED or SYNCSLOT_STATE_READY
>   */
>  void
>  ReplicationSlotCreate(const char *name, bool db_specific,
>     ReplicationSlotPersistency persistency,
> -   bool two_phase, bool failover)
> +   bool two_phase, bool failover, char sync_state)
>
>
> 27a.
> Why is this comment even mentioning SYNCSLOT_STATE_READY? IIUC it
> doesn't make sense to ever call ReplicationSlotCreate directly setting
> the 'r' state (e.g., bypassing 'i' ???)
>
> ~
>
> 27b.
> Indeed, IMO there should be Assert(sync_state == SYNCSLOT_STATE_NONE
> || syncstate == SYNCSLOT_STATE_INITIATED); to guarantee this.
>
> ======
> src/include/replication/slot.h
>
> 28.
> + /*
> + * Is this a slot created by a sync-slot worker?
> + *
> + * Only relevant for logical slots on the physical standby.
> + */
> + char sync_state;
> +
>
> (probably I am repeating a previous thought here)
>
> The comment says the field is only relevant for standby, and that's
> how I've been visualizing it, and why I had previously suggested even
> renaming it to 'standby_sync_state'. However, replies are saying that
> after failover these sync_states also have "some meaning for the
> primary server".
>
> That's the part I have trouble understanding. IIUC the server states
> are just either all 'n' (means nothing) or 'r' because they are just
> leftover from the old standby state. So, does it *truly* have meaning
> for the server? Or should those states somehow be removed/ignored on
> the new primary? Anyway, the point is that if this field does have
> meaning also on the primary (I doubt) then those details should be in
> this comment. Otherwise "Only relevant ... on the standby" is too
> misleading.
>

I have modified it currently, but I will give another thought on your
suggestions here (and in earlier emails) and will let you know.

> ======
> .../t/050_standby_failover_slots_sync.pl
>

We are working on CFbot failure fixes in this file and restructing the
tests here. Thus I am keeping these test comments on hold and will
address in next version.

> 29.
> +# Create table and publication on primary
> +$primary->safe_psql('postgres', "CREATE TABLE tab_mypub3 (a int
> PRIMARY KEY);");
> +$primary->safe_psql('postgres', "CREATE PUBLICATION mypub3 FOR TABLE
> tab_mypub3;");
> +
>
> 29a.
> /on primary/on the primary/
>
> ~
>
> 29b.
> Consider to combine those DDL
>
> ~
>
> 29c.
> Perhaps for consistency, you should be calling this 'regress_mypub3'.
>
> ~~~
>
> 30.
> +# Create a subscriber node
> +my $subscriber3 = PostgreSQL::Test::Cluster->new('subscriber3');
> +$subscriber3->init(allows_streaming => 'logical');
> +$subscriber3->start;
> +$subscriber3->safe_psql('postgres', "CREATE TABLE tab_mypub3 (a int
> PRIMARY KEY);");
> +
> +# Create a subscription with failover = true & wait for sync to complete.
> +$subscriber3->safe_psql('postgres',
> + "CREATE SUBSCRIPTION mysub3 CONNECTION '$publisher_connstr' "
> +   . "PUBLICATION mypub3 WITH (slot_name = lsub3_slot, failover = true);");
> +$subscriber3->wait_for_subscription_sync;
>
> 30a
> Consider combining those DDLs.
>
> ~
>
> 30b.
> Probably for consistency, you should be calling this 'regress_mysub3'.
>
> ~~~
>
> 31.
> +# Advance lsn on the primary
> +$primary->safe_psql('postgres',
> + "SELECT pg_log_standby_snapshot();");
> +$primary->safe_psql('postgres',
> + "SELECT pg_log_standby_snapshot();");
> +$primary->safe_psql('postgres',
> + "SELECT pg_log_standby_snapshot();");
> +
>
> Consider combining all those DDLs.
>
> ~~~
>
> 32.
> +# Truncate table on primary
> +$primary->safe_psql('postgres',
> + "TRUNCATE TABLE tab_mypub3;");
> +
> +# Insert data on the primary
> +$primary->safe_psql('postgres',
> + "INSERT INTO tab_mypub3 SELECT generate_series(1, 10);");
> +
>
> Consider combining those DDLs.
>
> ~~~
>
> 33.
> +# Confirm that restart_lsn of lsub3_slot slot is synced to the standby
> +$result = $standby3->safe_psql('postgres',
> + qq[SELECT '$primary_lsn' >= restart_lsn from pg_replication_slots
> WHERE slot_name = 'lsub3_slot';]);
> +is($result, 't', 'restart_lsn of slot lsub3_slot synced to standby');
>
>
> Does "'$primary_lsn' >= restart_lsn" make sense here? NOTE, the sign
> was '<=' in v43-0002
>
> ~~~
>
> 34.
> +# Confirm that confirmed_flush_lsn of lsub3_slot slot is synced to the standby
> +$result = $standby3->safe_psql('postgres',
> + qq[SELECT '$primary_lsn' >= confirmed_flush_lsn from
> pg_replication_slots WHERE slot_name = 'lsub3_slot';]);
> +is($result, 't', 'confirmed_flush_lsn of slot lsub3_slot synced to
> the standby');
>
> Does "'$primary_lsn' >= confirmed_flush_lsn" make sense here? NOTE,
> the sign was '<=' in v43-0002
>
> ~~~
>
> 35.
> +##################################################
> +# Test that synchronized slot can neither be docoded nor dropped by the user
> +##################################################
>
> 35a.
> /docoded/decoded/
>
> ~
>
> 35b.
> Please give explanation in the comment *why* those ops are not allowed
> (e.g. because the hot_standby_feedback GUC does not have an accepted
> value)
>
> ~~~
>
> 36.
> +##################################################
> +# Create another slot which stays in sync_state as initiated ('i')
> +##################################################
> +
>
> Please explain the comment as to *why* it gets stuck in the initiated state.
>
>
> ~~~
>
> 37.
> +##################################################
> +# Promote the standby3 to primary. Confirm that:
> +# a) the sync-ready('r') slot 'lsub3_slot' is retained on new primary
> +# b) the initiated('i') slot 'logical_slot'is dropped on promotion
> +# c) logical replication for mysub3 is resumed succesfully after failover
> +##################################################
>
>
> /'logical_slot'is/'logical_slot' is/ (missing space)
>
> /succesfully/successfully/
>
> ~~~
>
> 38.
> +# Update subscription with new primary's connection info
> +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3 DISABLE;");
> +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3
> CONNECTION '$standby3_conninfo';");
> +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3 ENABLE;");
>
>
> Consider combining all those DDLs.
>
> ~~~
>
> 39.
> +
> +# Insert data on the new primary
> +$standby3->safe_psql('postgres',
> + "INSERT INTO tab_mypub3 SELECT generate_series(11, 20);");
> +
> +# Confirm that data in tab_mypub3 replicated on subscriber
> +is( $subscriber3->safe_psql('postgres', q{SELECT count(*) FROM tab_mypub3;}),
> + "20",
> + 'data replicated from new primary');
>
> Shouldn't there be some wait_for_subscription_sync logic (or similar)
> here just to ensure the subscriber3 had time to receive that data
> before you immediately check that it had arrived?
>
> ======
> [1] My v43-0002 review.
> https://www.postgresql.org/message-id/CAHut%2BPuuqEpDse5msENsVuK3rjTRN-QGS67rRCGVv%2BzcT-f0GA%40mail.gmail.com
> [2] Replies to v43-0002 review.
> https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
pgsql-hackers by date:
From: shveta malik
Date: 15 December 2023, 05:33:33
Subject: Re: Synchronizing slots from primary to standby
From: shveta malik
Date: 15 December 2023, 05:42:53
Subject: Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next