Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uCKGo3rjSuM42i2UL5M7oeLTxV3Sht+=O9f1yu=fWPPjg@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Wed, Dec 13, 2023 at 3:53 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shveta, here are some review comments for v45-0002. > Thanks for the feedback. Addressed these in v48. Please find my comments on some. > ====== > doc/src/sgml/bgworker.sgml > > 1. > + <variablelist> > + <varlistentry> > + <term><literal>BgWorkerStart_PostmasterStart</literal></term> > + <listitem> > + <para> > + <indexterm><primary>BgWorkerStart_PostmasterStart</primary></indexterm> > + Start as soon as postgres itself has finished its own initialization; > + processes requesting this are not eligible for database connections. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > + <term><literal>BgWorkerStart_ConsistentState</literal></term> > + <listitem> > + <para> > + <indexterm><primary>BgWorkerStart_ConsistentState</primary></indexterm> > + Start as soon as a consistent state has been reached in a hot-standby, > + allowing processes to connect to databases and run read-only queries. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > + <term><literal>BgWorkerStart_RecoveryFinished</literal></term> > + <listitem> > + <para> > + <indexterm><primary>BgWorkerStart_RecoveryFinished</primary></indexterm> > + Start as soon as the system has entered normal read-write state. Note > + that the <literal>BgWorkerStart_ConsistentState</literal> and > + <literal>BgWorkerStart_RecoveryFinished</literal> are equivalent > + in a server that's not a hot standby. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > + <term><literal>BgWorkerStart_ConsistentState_HotStandby</literal></term> > + <listitem> > + <para> > + <indexterm><primary>BgWorkerStart_ConsistentState_HotStandby</primary></indexterm> > + Same meaning as <literal>BgWorkerStart_ConsistentState</literal> but > + it is more strict in terms of the server i.e. start the worker only > + if it is hot-standby. > + </para> > + </listitem> > + </varlistentry> > + </variablelist> > > Maybe reorder these slightly, because I felt it is better if the > BgWorkerStart_ConsistentState_HotStandby comes next after > BgWorkerStart_ConsistentState, which it refers to > > For example:: > 1st.BgWorkerStart_PostmasterStart > 2nd.BgWorkerStart_ConsistentState > 3rd.BgWorkerStart_ConsistentState_HotStandby > 4th.BgWorkerStart_RecoveryFinished > > ====== > doc/src/sgml/config.sgml > > 2. > <varname>enable_syncslot</varname> = true > > Not sure, but I thought the "= true" part should be formatted too. > > SUGGESTION > <literal>enable_syncslot = true</literal> > > ====== > doc/src/sgml/logicaldecoding.sgml > > 3. > + <para> > + A logical replication slot on the primary can be synchronized to the hot > + standby by enabling the failover option during slot creation and setting > + <varname>enable_syncslot</varname> on the standby. For the synchronization > + to work, it is mandatory to have a physical replication slot between the > + primary and the standby. It's highly recommended that the said physical > + replication slot is listed in <varname>standby_slot_names</varname> on > + the primary to prevent the subscriber from consuming changes faster than > + the hot standby. Additionally, <varname>hot_standby_feedback</varname> > + must be enabled on the standby for the slots synchronization to work. > + </para> > > I felt those parts that describe the mandatory GUCs should be kept together. > > SUGGESTION > For the synchronization to work, it is mandatory to have a physical > replication slot between the primary and the standby, and > <varname>hot_standby_feedback</varname> must be enabled on the > standby. > > It's also highly recommended that the said physical replication slot > is named in <varname>standby_slot_names</varname> list on the primary, > to prevent the subscriber from consuming changes faster than the hot > standby. > > ~~~ > > 4. (Chapter 49) > > By enabling synchronization of slots, logical replication can be > resumed after failover depending upon the > pg_replication_slots.sync_state for the synchronized slots on the > standby at the time of failover. Only slots that were in ready > sync_state ('r') on the standby before failover can be used for > logical replication after failover. However, the slots which were in > initiated sync_state ('i') and not sync-ready ('r') at the time of > failover will be dropped and logical replication for such slots can > not be resumed after failover. This applies to the case where a > logical subscription is disabled before failover and is enabled after > failover. If the synchronized slot due to disabled subscription could > not be made sync-ready ('r') on standby, then the subscription can not > be resumed after failover even when enabled. If the primary is idle, > then the synchronized slots on the standby may take a noticeable time > to reach the ready ('r') sync_state. This can be sped up by calling > the pg_log_standby_snapshot function on the primary. > > ~ > > Somehow, I still felt all that was too wordy/repetitive. Below is my > attempt to make it more concise. Thoughts? > > SUGGESTION > The ability to resume logical replication after failover depends upon > the pg_replication_slots.sync_state value for the synchronized slots > on the standby at the time of failover. Only slots that have attained > a "ready" sync_state ('r') on the standby before failover can be used > for logical replication after failover. Slots that have not yet > reached 'r' state (they are still 'i') will be dropped, therefore > logical replication for those slots cannot be resumed. For example, if > the synchronized slot could not become sync-ready on standby due to a > disabled subscription, then the subscription cannot be resumed after > failover even when it is enabled. > > If the primary is idle, the synchronized slots on the standby may take > a noticeable time to reach the ready ('r') sync_state. This can be > sped up by calling the pg_log_standby_snapshot function on the > primary. > > ====== > doc/src/sgml/system-views.sgml > > 5. > + <para> > + Defines slot synchronization state. This is meaningful on the physical > + standby which has configured <varname>enable_syncslot</varname> = true > + </para> > > As mentioned in the previous review comment ([1]#10) I thought it > might be good to include a hyperlink cross-reference to the > 'enable_syncslot' GUC. > > ~~~ > > 6. > + <para> > + The hot standby can have any of these sync_state for the slots but on a > + hot standby, the slots with state 'r' and 'i' can neither be used for > + logical decoding nor dropped by the user. The primary server will have > + sync_state as 'n' for all the slots. But if the standby is promoted to > + become the new primary server, sync_state can be seen 'r' as well. On > + this new primary server, slots with sync_state as 'r' and 'n' will > + behave the same. > + </para></entry> > > 6a. > /these sync_state for the slots/these sync_state values for the slots/ > > ~ > > 6b > Hm. I still felt (same as previous review [1]#12b) that there seems > too much information here. > > IIUC the sync_state is only meaningful on the standby. Sure, it might > have some values line 'n' or 'r' on the primary also, but those either > mean nothing ('n') or are leftover states from a previous failover > from a standby ('r'), which also means nothing. So can't we just say > it more succinctly like that? > > SUGGESTION > The sync_state has no meaning on the primary server; the primary > sync_state value is default 'n' for all slots but may (if leftover > from a promoted standby) also be 'r'. > > ====== > .../libpqwalreceiver/libpqwalreceiver.c > > 7. > static void > libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, > - bool failover) > + bool failover) > > Still seems to be tampering with indentation that should only be in patch 0001. > > ====== > src/backend/replication/logical/slotsync.c > > 8. wait_for_primary_slot_catchup > > The meaning of the boolean return of this function is still not > described by the function comment. > > ~~~ > > 9. > + * If passed, *wait_attempts_exceeded will be set to true only if this > + * function exits after exhausting its wait attempts. It will be false > + * in all the other cases like failure, remote-slot invalidation, primary > + * could catch up. > > The above already says when a return false happens, so it seems > overkill to give more information. > > SUGGESTION > If passed, *wait_attempts_exceeded will be set to true only if this > function exits due to exhausting its wait attempts. It will be false > in all the other cases. > > ~~~ > > 10. > > +static bool > +wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot, > + bool *wait_attempts_exceeded) > +{ > +#define WAIT_OUTPUT_COLUMN_COUNT 4 > +#define WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS 5 > + > > 10a > Maybe the long constant name is too long. How about > WAIT_PRIMARY_CATCHUP_ATTEMPTS? > > ~~~ > > 10b. > IMO it is better to Assert the input value of this kind of side-effect > return parameter, to give a better understanding and to prevent future > accidents. > > SUGGESTION > Assert(wait_attempts_exceeded == NULL |} *wait_attempts_exceeded == false); > > ~~~ > > 11. synchronize_one_slot > > + ReplicationSlot *s; > + ReplicationSlot *slot; > + char sync_state = '\0'; > > 11a. > I don't think you need both 's' and 'slot' ReplicationSlot -- it looks > a bit odd. Can't you just reuse the one 'slot' variable? > > ~ > > 11b. > Also, maybe those assignment like > + slot = MyReplicationSlot; > > can have an explanatory comment like: > /* For convenience, we assign MyReplicationSlot to a shorter variable name. */ > I have changed it to slightly simpler one, if that is okay? > ~~~ > > 12. > +static void > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot, > + bool *slot_updated) > +{ > + ReplicationSlot *s; > + ReplicationSlot *slot; > + char sync_state = '\0'; > > In my previous review [1]#33a I thought it was strange to assign the > sync_state (which is essentially an enum) to some meaningless value, > so I suggested it should be set to SYNCSLOT_STATE_NONE in the > declaration. The reply [2] was "No, that will change the flow. It > should stay uninitialized if the slot is not found." > > But I am not convinced there is any flow problem. Also, > SYNCSLOT_STATE_NONE seems the naturally correct default for something > with no state. It cannot be found and be SYNCSLOT_STATE_NONE at the > same time (that is reported as an ERROR "skipping sync of slot") so I > see no problem. > > The CURRENT code is like this: > > /* Slot created by the slot sync worker exists, sync it */ > if (sync_state) > { > Assert(sync_state == SYNCSLOT_STATE_READY || sync_state == > SYNCSLOT_STATE_INITIATED); > ... > } > /* Otherwise create the slot first. */ > else > { > ... > } > > AFAICT that could easily be changed to like below, with no change to > the logic, and it avoids setting strange values. > > SUGGESTION. > > if (sync_state == SYNCSLOT_STATE_NONE) > { > /* Slot not found. Create it. */ > .. > } > else > { > Assert(sync_state == SYNCSLOT_STATE_READY || sync_state == > SYNCSLOT_STATE_INITIATED); > ... > } > I have restructured the entire code here and thus initialization of sync_state is no longer needed. Please review now and let me know. > ~~~ > > 13. synchronize_one_slot > > +static void > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot, > + bool *slot_updated) > > This *slot_updated parameter looks dubious. It is used in a loop from > the caller to mean that ANY slot was updated -- e.g. maybe it is true > or false on entry to this function. > > But, Instead of having some dependency between this function and the > caller, IMO it makes more sense if we would make this just a boolean > function in the first place (e.g. was updated? T/F) > > Then the caller can also be written more easily like: > > some_slot_updated |= synchronize_one_slot(wrconn, remote_slot); > > ~~~ > > 14. > + /* Search for the named slot */ > + if ((s = SearchNamedReplicationSlot(remote_slot->name, true))) > + { > + SpinLockAcquire(&s->mutex); > + sync_state = s->data.sync_state; > + SpinLockRelease(&s->mutex); > + > + /* User created slot with the same name exists, raise ERROR. */ > + if (sync_state == SYNCSLOT_STATE_NONE) > + { > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("skipping sync of slot \"%s\" as it is a user created" > + " slot", remote_slot->name), > + errdetail("This slot has failover enabled on the primary and" > + " thus is sync candidate but user created slot with" > + " the same name already exists on the standby"))); > + } > + } > > > Extra curly brackets around the ereport are not needed. > > ~~~ > > 15. > + /* > + * Sanity check: With hot_standby_feedback enabled and > + * invalidations handled apropriately as above, this should never > + * happen. > + */ > + if (remote_slot->restart_lsn < slot->data.restart_lsn) > + { > + elog(ERROR, > + "not synchronizing local slot \"%s\" LSN(%X/%X)" > + " to remote slot's LSN(%X/%X) as synchronization " > + " would move it backwards", remote_slot->name, > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > + LSN_FORMAT_ARGS(remote_slot->restart_lsn)); > + } > > 15a. > /apropriately/appropriately/ > > ~ > > 15b. > Extra curly brackets around the elog are not needed. > > ~~~ > > 16. synchronize_slots > > +static bool > +synchronize_slots(WalReceiverConn *wrconn) > +{ > +#define SLOTSYNC_COLUMN_COUNT 9 > + Oid slotRow[SLOTSYNC_COLUMN_COUNT] = {TEXTOID, TEXTOID, LSNOID, > + LSNOID, XIDOID, BOOLOID, BOOLOID, TEXTOID, INT2OID}; > + > + WalRcvExecResult *res; > + TupleTableSlot *tupslot; > + StringInfoData s; > + List *remote_slot_list = NIL; > + MemoryContext oldctx = CurrentMemoryContext; > + ListCell *lc; > + bool slot_updated = false; > > Suggest renaming 'slot_updated' to 'some_slot_updated' or > 'update_occurred' etc because the current name makes it look like it > applies to a single slot, but it doesn't. > > ~~~ > > 17. > + SpinLockAcquire(&WalRcv->mutex); > + if (!WalRcv || > + (WalRcv->slotname[0] == '\0') || > + XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) > + { > + SpinLockRelease(&WalRcv->mutex); > + return slot_updated; > + } > + SpinLockRelease(&WalRcv->mutex); > > IMO "return false;" here is more clear than saying "return slot_updated;" > > ~~~ > > 18. > + appendStringInfo(&s, > + "SELECT slot_name, plugin, confirmed_flush_lsn," > + " restart_lsn, catalog_xmin, two_phase, failover," > + " database, pg_get_slot_invalidation_cause(slot_name)" > + " FROM pg_catalog.pg_replication_slots" > + " WHERE failover and sync_state != 'i'"); > > 18a. > /and/AND/ > > ~ > > 18b. > In the reply post (see [2]#32) Shveta said "I could not find quote_* > function for a character just like we have 'quote_literal_cstr' for > string". If you still want to use constant substitution instead of > just hardwired 'i' then why do even you need a quote_* function? I > thought the appendStringInfo uses a printf style format-string > internally, so I assumed it is possible to substitute the state char > directly using '%c'. > Since we have removed cascading standby support, this condition (sync_state != 'i') is no longer needed in the query. > ~~~ > > 19. > + > + > + > + /* We are done, free remote_slot_list elements */ > + list_free_deep(remote_slot_list); > + > + walrcv_clear_result(res); > + > + return slot_updated; > +} > > Excessive blank lines. > > ~~~ > > 20. validate_primary_slot > > + appendStringInfo(&cmd, > + "SELECT count(*) = 1 from pg_replication_slots " > + "WHERE slot_type='physical' and slot_name=%s", > + quote_literal_cstr(PrimarySlotName)); > > > /and/AND/ > > ~~~ > > 21. > + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple); > + tuple_ok = tuplestore_gettupleslot(res->tuplestore, true, false, slot); > + Assert(tuple_ok); /* It must return one tuple */ > > IMO it's better to use all the var names the same across all > functions? So call this 'tupslot' like the other > MakeSingleTupleTableSlot result. > > ~~~ > > 22. validate_slotsync_parameters > > +/* > + * Checks if GUCs are set appropriately before starting slot sync worker > + * > + * The slot sync worker can not start if 'enable_syncslot' is off and > + * since 'enable_syncslot' is ON, check that the other GUC settings > + * (primary_slot_name, hot_standby_feedback, wal_level, primary_conninfo) > + * are compatible with slot synchronization. If not, raise ERROR. > + */ > +static void > +validate_slotsync_parameters(char **dbname) > +{ > > 22a. > The comment is quite verbose. IMO the 2nd para seems just unnecessary > detail of the 1st para. > > SUGGESTION > Check that all necessary GUCs for slot synchronization are set > appropriately. If not, raise an ERROR. > > ~~~ > > 22b. > IMO (and given what was said in the comment about enable_syncslot must > be on)the first statement of this function should be: > > /* Sanity check. */ > Assert(enable_syncslot); > > ~~~ > > 23. slotsync_reread_config > > + old_dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); > + Assert(old_dbname); > > (This is same comment as old review [1]#61) > > Hmm. I still don't see why this extraction of the dbname cannot be > deferred until later when you know the PrimaryConnInfo has changed, > otherwise, it might be redundant to do this. Shveta replied [2] that > "Once PrimaryConnInfo is changed, we can not get old-dbname.", but I'm > not so sure. Isn't this walrcv_get_dbname_from_conninfo just doing a > string search -- Why can't you defer this until you know > conninfoChanged is true, and then to get the old_dbname, you can just > pass the old_primary_conninfo. E.g. call like > walrcv_get_dbname_from_conninfo(old_primary_conninfo); Maybe I am > mistaken. > Sorry missed your point earlier that we can use old_primary_conninfo to extract dbname later. I have removed this re-validation now as we will restart the worker in case of a parameter change similar to the case of logical apply worker. So these changes are no longer needed. > ~~ > > 24. > + /* > + * Since we have initialized this worker with the old dbname, thus > + * exit if dbname changed. Let it get restarted and connect to the new > + * dbname specified. > + */ > + if (conninfoChanged && strcmp(old_dbname, new_dbname) != 0) > + ereport(ERROR, > + errmsg("exiting slot sync worker as dbname in " > + "primary_conninfo changed")); > > IIUC when the tablesync has to restart, it emits a LOG message before > it exits; but it's not an ERROR. So, shouldn't this be similar -- IMO > it is not an "error" for the user to wish to change the dbname. Maybe > this should be LOG followed by an explicit exit. If you agree, then it > might be better to encapsulate such logic in some little function: > > // pseudo-code > void slotsync_worker_restart(const char *msg) > { > ereport(LOG, msg... > exit(0); > } > we can not do proc_exit(0), as then postmaster will not restart it on clean-exit. I agree with your logic, but will have another argument in this function to accept 'exit code' from the caller. > ~~~ > > 25. ReplSlotSyncWorkerMain > > + for (;;) > + { > + int rc; > + long naptime = WORKER_DEFAULT_NAPTIME_MS; > + TimestampTz now; > + bool slot_updated; > + > + ProcessSlotSyncInterrupts(wrconn); > + > + slot_updated = synchronize_slots(wrconn); > > Here I think the 'slot_updated' should be renamed to the same name as > in #16 above (e.g. 'some_slot_updated' or 'any_slot_updated' or > 'update_occurred' etc). > > ~~~ > > 26. SlotSyncWorkerRegister > > + if (!enable_syncslot) > + { > + ereport(LOG, > + errmsg("skipping slots synchronization because enable_syncslot is " > + "disabled.")); > + return; > + } > > Instead of saying "because..." in the error message maybe keep the > message more terse and describe the "because" part in the errdetail > > SUGGESTION > errmsg("skipping slot synchronization") > errdetail("enable_syncslot is disabled.") > > > ====== > src/backend/replication/slot.c > > 27. > + * sync_state: Defines slot synchronization state. For user created slots, it > + * is SYNCSLOT_STATE_NONE and for the slots being synchronized on the physical > + * standby, it is either SYNCSLOT_STATE_INITIATED or SYNCSLOT_STATE_READY > */ > void > ReplicationSlotCreate(const char *name, bool db_specific, > ReplicationSlotPersistency persistency, > - bool two_phase, bool failover) > + bool two_phase, bool failover, char sync_state) > > > 27a. > Why is this comment even mentioning SYNCSLOT_STATE_READY? IIUC it > doesn't make sense to ever call ReplicationSlotCreate directly setting > the 'r' state (e.g., bypassing 'i' ???) > > ~ > > 27b. > Indeed, IMO there should be Assert(sync_state == SYNCSLOT_STATE_NONE > || syncstate == SYNCSLOT_STATE_INITIATED); to guarantee this. > > ====== > src/include/replication/slot.h > > 28. > + /* > + * Is this a slot created by a sync-slot worker? > + * > + * Only relevant for logical slots on the physical standby. > + */ > + char sync_state; > + > > (probably I am repeating a previous thought here) > > The comment says the field is only relevant for standby, and that's > how I've been visualizing it, and why I had previously suggested even > renaming it to 'standby_sync_state'. However, replies are saying that > after failover these sync_states also have "some meaning for the > primary server". > > That's the part I have trouble understanding. IIUC the server states > are just either all 'n' (means nothing) or 'r' because they are just > leftover from the old standby state. So, does it *truly* have meaning > for the server? Or should those states somehow be removed/ignored on > the new primary? Anyway, the point is that if this field does have > meaning also on the primary (I doubt) then those details should be in > this comment. Otherwise "Only relevant ... on the standby" is too > misleading. > I have modified it currently, but I will give another thought on your suggestions here (and in earlier emails) and will let you know. > ====== > .../t/050_standby_failover_slots_sync.pl > We are working on CFbot failure fixes in this file and restructing the tests here. Thus I am keeping these test comments on hold and will address in next version. > 29. > +# Create table and publication on primary > +$primary->safe_psql('postgres', "CREATE TABLE tab_mypub3 (a int > PRIMARY KEY);"); > +$primary->safe_psql('postgres', "CREATE PUBLICATION mypub3 FOR TABLE > tab_mypub3;"); > + > > 29a. > /on primary/on the primary/ > > ~ > > 29b. > Consider to combine those DDL > > ~ > > 29c. > Perhaps for consistency, you should be calling this 'regress_mypub3'. > > ~~~ > > 30. > +# Create a subscriber node > +my $subscriber3 = PostgreSQL::Test::Cluster->new('subscriber3'); > +$subscriber3->init(allows_streaming => 'logical'); > +$subscriber3->start; > +$subscriber3->safe_psql('postgres', "CREATE TABLE tab_mypub3 (a int > PRIMARY KEY);"); > + > +# Create a subscription with failover = true & wait for sync to complete. > +$subscriber3->safe_psql('postgres', > + "CREATE SUBSCRIPTION mysub3 CONNECTION '$publisher_connstr' " > + . "PUBLICATION mypub3 WITH (slot_name = lsub3_slot, failover = true);"); > +$subscriber3->wait_for_subscription_sync; > > 30a > Consider combining those DDLs. > > ~ > > 30b. > Probably for consistency, you should be calling this 'regress_mysub3'. > > ~~~ > > 31. > +# Advance lsn on the primary > +$primary->safe_psql('postgres', > + "SELECT pg_log_standby_snapshot();"); > +$primary->safe_psql('postgres', > + "SELECT pg_log_standby_snapshot();"); > +$primary->safe_psql('postgres', > + "SELECT pg_log_standby_snapshot();"); > + > > Consider combining all those DDLs. > > ~~~ > > 32. > +# Truncate table on primary > +$primary->safe_psql('postgres', > + "TRUNCATE TABLE tab_mypub3;"); > + > +# Insert data on the primary > +$primary->safe_psql('postgres', > + "INSERT INTO tab_mypub3 SELECT generate_series(1, 10);"); > + > > Consider combining those DDLs. > > ~~~ > > 33. > +# Confirm that restart_lsn of lsub3_slot slot is synced to the standby > +$result = $standby3->safe_psql('postgres', > + qq[SELECT '$primary_lsn' >= restart_lsn from pg_replication_slots > WHERE slot_name = 'lsub3_slot';]); > +is($result, 't', 'restart_lsn of slot lsub3_slot synced to standby'); > > > Does "'$primary_lsn' >= restart_lsn" make sense here? NOTE, the sign > was '<=' in v43-0002 > > ~~~ > > 34. > +# Confirm that confirmed_flush_lsn of lsub3_slot slot is synced to the standby > +$result = $standby3->safe_psql('postgres', > + qq[SELECT '$primary_lsn' >= confirmed_flush_lsn from > pg_replication_slots WHERE slot_name = 'lsub3_slot';]); > +is($result, 't', 'confirmed_flush_lsn of slot lsub3_slot synced to > the standby'); > > Does "'$primary_lsn' >= confirmed_flush_lsn" make sense here? NOTE, > the sign was '<=' in v43-0002 > > ~~~ > > 35. > +################################################## > +# Test that synchronized slot can neither be docoded nor dropped by the user > +################################################## > > 35a. > /docoded/decoded/ > > ~ > > 35b. > Please give explanation in the comment *why* those ops are not allowed > (e.g. because the hot_standby_feedback GUC does not have an accepted > value) > > ~~~ > > 36. > +################################################## > +# Create another slot which stays in sync_state as initiated ('i') > +################################################## > + > > Please explain the comment as to *why* it gets stuck in the initiated state. > > > ~~~ > > 37. > +################################################## > +# Promote the standby3 to primary. Confirm that: > +# a) the sync-ready('r') slot 'lsub3_slot' is retained on new primary > +# b) the initiated('i') slot 'logical_slot'is dropped on promotion > +# c) logical replication for mysub3 is resumed succesfully after failover > +################################################## > > > /'logical_slot'is/'logical_slot' is/ (missing space) > > /succesfully/successfully/ > > ~~~ > > 38. > +# Update subscription with new primary's connection info > +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3 DISABLE;"); > +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3 > CONNECTION '$standby3_conninfo';"); > +$subscriber3->safe_psql('postgres', "ALTER SUBSCRIPTION mysub3 ENABLE;"); > > > Consider combining all those DDLs. > > ~~~ > > 39. > + > +# Insert data on the new primary > +$standby3->safe_psql('postgres', > + "INSERT INTO tab_mypub3 SELECT generate_series(11, 20);"); > + > +# Confirm that data in tab_mypub3 replicated on subscriber > +is( $subscriber3->safe_psql('postgres', q{SELECT count(*) FROM tab_mypub3;}), > + "20", > + 'data replicated from new primary'); > > Shouldn't there be some wait_for_subscription_sync logic (or similar) > here just to ensure the subscriber3 had time to receive that data > before you immediately check that it had arrived? > > ====== > [1] My v43-0002 review. > https://www.postgresql.org/message-id/CAHut%2BPuuqEpDse5msENsVuK3rjTRN-QGS67rRCGVv%2BzcT-f0GA%40mail.gmail.com > [2] Replies to v43-0002 review. > https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com > > Kind Regards, > Peter Smith. > Fujitsu Australia
pgsql-hackers by date: