Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uDcOf5Hvk_CdCCAbfx9SY+og===tgiuhWKzkYyqebui9g@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
Responses |
RE: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Thu, Dec 7, 2023 at 1:33 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi. > > Here are my review comments for patch v43-0002. > Thanks for the feedback. I have addressed most of these in v45. Please find my response on a few which are pending or are not needed. > ====== > Commit message > > 1. > The nap time of worker is tuned according to the activity on the primary. > The worker starts with nap time of 10ms and if no activity is observed on > the primary for some time, then nap time is increased to 10sec. And if > activity is observed again, nap time is reduced back to 10ms. > > ~ > /nap time of worker/nap time of the worker/ > /And if/If/ > > ~~~ > > 2. > Slots synced on the standby can be identified using 'sync_state' column of > pg_replication_slots view. The values are: > 'n': none for user slots, > 'i': sync initiated for the slot but waiting for the remote slot on the > primary server to catch up. > 'r': ready for periodic syncs. > > ~ > > /identified using/identified using the/ > > The meaning of "identified by" is unclear to me. It also seems to > clash with later descriptions in system-views.sgml. Please see my > later review comment about it (in the sgml file) > I have rephrased it, please check now and let me know. > ====== > doc/src/sgml/bgworker.sgml > > 3. > bgw_start_time is the server state during which postgres should start > the process; it can be one of BgWorkerStart_PostmasterStart (start as > soon as postgres itself has finished its own initialization; processes > requesting this are not eligible for database connections), > BgWorkerStart_ConsistentState (start as soon as a consistent state has > been reached in a hot standby, allowing processes to connect to > databases and run read-only queries), and > BgWorkerStart_RecoveryFinished (start as soon as the system has > entered normal read-write state. Note that the > BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are > equivalent in a server that's not a hot standby), and > BgWorkerStart_ConsistentState_HotStandby (same meaning as > BgWorkerStart_ConsistentState but it is more strict in terms of the > server i.e. start the worker only if it is hot-standby; if it is > consistent state in non-standby, worker will not be started). Note > that this setting only indicates when the processes are to be started; > they do not stop when a different state is reached. > > ~ > > 3a. > This seems to have grown to become just one enormous sentence that is > too hard to read. IMO this should be changed to be a <variablelist> of > possible values instead of a big slab of text. I suspect it could also > be simplified quite a lot -- something like below > > SUGGESTION > bgw_start_time is the server state during which postgres should start > the process. Note that this setting only indicates when the processes > are to be started; they do not stop when a different state is reached. > Possible values are: > > - BgWorkerStart_PostmasterStart (start as soon as postgres itself has > finished its own initialization; processes requesting this are not > eligible for database connections) > > - BgWorkerStart_ConsistentState (start as soon as a consistent state > has been reached in a hot-standby, allowing processes to connect to > databases and run read-only queries) > > - BgWorkerStart_RecoveryFinished (start as soon as the system has > entered normal read-write state. Note that the > BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are > equivalent in a server that's not a hot standby) > > - BgWorkerStart_ConsistentState_HotStandby (same meaning as > BgWorkerStart_ConsistentState but it is more strict in terms of the > server i.e. start the worker only if it is hot-standby; if it is a > consistent state in non-standby, the worker will not be started). > > ~~~ > > 3b. > "i.e. start the worker only if it is hot-standby; if it is consistent > state in non-standby, worker will not be started" > > ~ > > Why is it even necessary to say the 2nd part "if it is consistent > state in non-standby, worker will not be started". It seems redundant > given 1st part says the same, right? > > > ====== > doc/src/sgml/config.sgml > > 4. > + <para> > + The standbys corresponding to the physical replication slots in > + <varname>standby_slot_names</varname> must enable > + <varname>enable_syncslot</varname> for the standbys to receive > + failover logical slots changes from the primary. > + </para> > > 4a. > Somehow "must enable enable_syncslot" seemed strange. Maybe re-word like: > > "must enable slot synchronization (see enable_syncslot)" > > OR > > "must configure enable_syncslot = true" > > ~~~ > > 4b. > (seems like repetitive use of "the standbys") > > /for the standbys to/to/ > > OR > > /for the standbys to/so they can/ > > ~~~ > > 5. > <varname>primary_conninfo</varname> string, or in a separate > - <filename>~/.pgpass</filename> file on the standby server (use > + <filename>~/.pgpass</filename> file on the standby server. (use > > This rearranged period seems unrelated to the current patch. Maybe > don't touch this. > > ~~~ > > 6. > + <para> > + Specify <literal>dbname</literal> in > + <varname>primary_conninfo</varname> string to allow synchronization > + of slots from the primary server to the standby server. > + This will only be used for slot synchronization. It is ignored > + for streaming. > </para> > > The wording "to allow synchronization of slots" seemed misleading to > me. Isn't that more the purpose of the 'enable_syncslot' GUC? I think > the intended wording is more like below: > > SUGGESTION > If slot synchronization is enabled then it is also necessary to > specify <literal>dbname</literal> in the > <varname>primary_conninfo</varname> string. This will only be used for > slot synchronization. It is ignored for streaming. > > ====== > doc/src/sgml/logicaldecoding.sgml > > 7. > + <para> > + A logical replication slot on the primary can be synchronized to the hot > + standby by enabling the failover option during slot creation and set > + <varname>enable_syncslot</varname> on the standby. For the synchronization > + to work, it is mandatory to have physical replication slot between the > + primary and the standby. This physical replication slot for the standby > + should be listed in <varname>standby_slot_names</varname> on the primary > + to prevent the subscriber from consuming changes faster than the hot > + standby. Additionally, similar to creating a logical replication slot > + on the hot standby, <varname>hot_standby_feedback</varname> should be > + set on the standby and a physical slot between the primary and the standby > + should be used. > + </para> > > > 7a. > /creation and set/creation and setting/ > /to have physical replication/to have a physical replication/ > > ~ > > 7b. > It's unclear why this is saying "should be listed in > standby_slot_names" and "hot_standby_feedback should be set on the > standby". Why is it saying "should" instead of MUST -- are these > optional? I thought the GUC validation function mandates these (???). > standby_slot_names setting is not mandatory, it is recommended though. OTOH hot_standby_feedback setting is mandatory. So I have changed accordingly. > ~ > > 7c. > Why does the paragraph say "and a physical slot between the primary > and the standby should be used."; isn't that exactly what was already > written earlier ("For the synchronization to work, it is mandatory to > have physical replication slot between the primary and the standby" > Removed the duplicate line. > ~~~ > > 8. > + <para> > + By enabling synchronization of slots, logical replication can be resumed > + after failover depending upon the > + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>sync_state</structfield> > + for the synchronized slots on the standby at the time of failover. > + The slots which were in ready sync_state ('r') on the standby before > + failover can be used for logical replication after failover. However, > + the slots which were in initiated sync_state ('i) and were not > + sync-ready ('r') at the time of failover will be dropped and logical > + replication for such slots can not be resumed after failover. This applies > + to the case where a logical subscription is disabled before > failover and is > + enabled after failover. If the synchronized slot due to disabled > + subscription could not be made sync-ready ('r') on standby, then the > + subscription can not be resumed after failover even when enabled. > > > 8a. > This feels overcomplicated -- too much information? > > SUGGESTION > depending upon the ... sync_state for the synchronized slots on the > standby at the time of failover. Only slots that were in ready > sync_state ('r') on the standby before failover can be used for > logical replication after failover > > ~~~ > > 8b. > + the slots which were in initiated sync_state ('i) and were not > + sync-ready ('r') at the time of failover will be dropped and logical > + replication for such slots can not be resumed after failover. This applies > + to the case where a logical subscription is disabled before > failover and is > + enabled after failover. If the synchronized slot due to disabled > + subscription could not be made sync-ready ('r') on standby, then the > + subscription can not be resumed after failover even when enabled. > > But isn't ALL that part pretty much redundant information for the > user? I thought these are not ready state, so they are not usable... > End-Of-Story. Isn't everything else just more like implementation > details, which the user does not need to know about? > 'sync_state' is a way to monitor the state of synchronization and I feel it is important to tell what happens with 'i' state slots. Also there was a comment to add this info in doc that disabled subscriptions are not guaranteed to be usable if enabled after failover. Thus it was added and rest of the info forms a base for that. We can trim down or rephrase if needed. > ~~~ > > 9. > + If the primary is idle, making the synchronized slot on the standby > + as sync-ready ('r') for enabled subscription may take noticeable time. > + This can be sped up by calling the > + <function>pg_log_standby_snapshot</function> function on the primary. > + </para> > > SUGGESTION > If the primary is idle, then the synchronized slots on the standby may > take a noticeable time to reach the ready ('r') sync_state. This can > be sped up by calling the > <function>pg_log_standby_snapshot</function> function on the primary. > > ====== > doc/src/sgml/system-views.sgml > > 10. > + > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>sync_state</structfield> <type>char</type> > + </para> > + <para> > + Defines slot synchronization state. This is meaningful on the physical > + standby which has enabled slots synchronization. > + </para> > > I felt that this part "which has enabled slots synchronization" should > cross-reference to the 'sync_enabled' GUC. > > ~~~ > > 11. > + <para> > + State code: > + <literal>n</literal> = none for user created slots, > + <literal>i</literal> = sync initiated for the slot but slot is not ready > + yet for periodic syncs, > + <literal>r</literal> = ready for periodic syncs. > + </para> > > I'm wondering why don't we just reuse 'd' (disabled), 'p' (pending), > 'e' (enabled) like the other tri-state attributes are using. > I think it is not a property of a slot where we say enabled/disabled. It is more like an operation and thus initiated, ready etc sounds better. These states are similar to the ones maintained for table-sync operation (SUBREL_STATE_INIT, SUBREL_STATE_READY etc) > > 12. > + <para> > + The hot standby can have any of these sync_state for the slots but on a > + hot standby, the slots with state 'r' and 'i' can neither be > used for logical > + decoded nor dropped by the user. The primary server will have sync_state > + as 'n' for all the slots. But if the standby is promoted to become the > + new primary server, sync_state can be seen 'r' as well. On this new > + primary server, slots with sync_state as 'r' and 'n' will > behave the same. > + </para></entry> > + </row> > > 12a. > /logical decoded/logical decoding/ > > ~ > > 12b. > "sync_state as 'r' and 'n' will behave the same" sounds kind of hacky. > Is there no alternative? > I am reviewing your suggestion on 'r' to 'n' conversion on promotion given later in this email. So give me some more time. > Anyway, IMO mentioning about primary server states seems overkill, > because you already said "This is meaningful on the physical standby" > which I took as implying that it is *not* meaningful from the POV of > the primary server. > In case we planned to retain 'r', it then makes sense to document that the sync_state on primary can also be 'r' if the primary was promoted from a standby, because this is a special case which the user may not be aware of. > In light of this, I'm wondering if a better name for this attribute > would be: 'standby_sync_state' > sync_state has some value for primary too. It is not null on primary. Thus the current name seems a better choice. > ====== > src/backend/access/transam/xlogrecovery.c > > 13. > + /* > + * Shutdown the slot sync workers to prevent potential conflicts between > + * user processes and slotsync workers after a promotion. Additionally, > + * drop any slots that have initiated but not yet completed the sync > + * process. > + */ > + ShutDownSlotSync(); > + slotsync_drop_initiated_slots(); > + > > Is this where maybe the 'sync_state' should also be updated for > everything so you are not left with confusion about different states > on a node that is no longer a standby node? > yes, this is the place. But this needs more thought as it may cause too much disk activity during promotion. so let me analyze and come back. > ====== > src/backend/postmaster/postmaster.c > > 14. PostmasterMain > > ApplyLauncherRegister(); > > + SlotSyncWorkerRegister(); > + > > Every other function call here is heavily commented but there is a > conspicuous absence of a comment here. > Added some comments, but not very confident on those, so let me know. > ~~~ > > 15. bgworker_should_start_now > > if (start_time == BgWorkerStart_ConsistentState) > return true; > + else if (start_time == BgWorkerStart_ConsistentState_HotStandby && > + pmState != PM_RUN) > + return true; > /* fall through */ > Change "else if" to "if" would be simpler. > > ====== > .../libpqwalreceiver/libpqwalreceiver.c > > 16. > + for (opt = opts; opt->keyword != NULL; ++opt) > + { > + /* > + * If multiple dbnames are specified, then the last one will be > + * returned > + */ > + if (strcmp(opt->keyword, "dbname") == 0 && opt->val && > + opt->val[0] != '\0') > + dbname = pstrdup(opt->val); > + } > > This can use a tidier C99 style to declare 'opt' as the loop variable. > > ~~~ > > 17. > static void > libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, > - bool failover) > + bool failover) > > What is this change for? Or, if something is wrong with the indent > then anyway it should be fixed in patch 0001. > yes, it should go to patch01. Done. > ====== > src/backend/replication/logical/logical.c > > 18. > > + /* > + * Slots in state SYNCSLOT_STATE_INITIATED should have been dropped on > + * promotion. > + */ > + if (!RecoveryInProgress() && slot->data.sync_state == > SYNCSLOT_STATE_INITIATED) > + elog(ERROR, "replication slot \"%s\" was not synced completely from > the primary server", > + NameStr(slot->data.name)); > + > + /* > + * Do not allow consumption of a "synchronized" slot until the standby > + * gets promoted. > + */ > + if (RecoveryInProgress() && slot->data.sync_state != SYNCSLOT_STATE_NONE) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot use replication slot \"%s\" for logical decoding", > + NameStr(slot->data.name)), > + errdetail("This slot is being synced from the primary server."), > + errhint("Specify another replication slot."))); > + > > 18a. > > Instead of having !RecoveryInProgress() and RecoveryInProgress() in > separate conditions is the code simpler like: > > SUGGESTION > > if (RecoveryInProgress()) > { > /* Do not allow ... */ > if (slot->data.sync_state != SYNCSLOT_STATE_NONE) ... > } > else > { > /* Slots in state... */ > if (slot->data.sync_state == SYNCSLOT_STATE_INITIATED) ... > } > > ~ > > 18b. > Should the errdetail give the current state? > I think it is not needed, current info looks good enough. User can always use pg_replication_slots to monitor sync_state info. > ====== > src/backend/replication/logical/slotsync.c > > 19. > +/* > + * Number of attempts for wait_for_primary_slot_catchup() after > + * which it aborts the wait and the slot sync worker then moves > + * to the next slot creation/sync. > + */ > +#define WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS 5 > > Given this is only used within one static function, I'm wondering if > it would be tidier to also move this macro to within that function. > > ~~~ > > 20. wait_for_primary_slot_catchup > > +/* > + * Wait for remote slot to pass locally reserved position. > + * > + * Ping and wait for the primary server for > + * WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS during a slot creation, if it still > + * does not catch up, abort the wait. The ones for which wait is aborted will > + * attempt the wait and sync in the next sync-cycle. > + * > + * *persist will be set to false if the slot has disappeared or was invalidated > + * on the primary; otherwise, it will be set to true. > + */ > > 20a. > The comment doesn't say the meaning of the boolean returned. > > ~ > > 20b. > /*persist will be set/If passed, *persist will be set/ > > ~~~ > > 21. > + appendStringInfo(&cmd, > + "SELECT conflicting, restart_lsn, confirmed_flush_lsn," > + " catalog_xmin FROM pg_catalog.pg_replication_slots" > + " WHERE slot_name = %s", > + quote_literal_cstr(remote_slot->name)); > > Somehow, I felt it is more readable if the " FROM" starts on a new line. > > e.g. > "SELECT conflicting, restart_lsn, confirmed_flush_lsn, catalog_xmin" > " FROM pg_catalog.pg_replication_slots" > " WHERE slot_name = %s" > > ~~~ > > 22. > + ereport(ERROR, > + (errmsg("could not fetch slot info for slot \"%s\" from the" > + " primary server: %s", > + remote_slot->name, res->err))); > > Perhaps the message can be shortened like: > "could not fetch slot \"%s\" info from the primary server: %s" > > ~~~ > > 23. > + ereport(WARNING, > + (errmsg("slot \"%s\" disappeared from the primary server," > + " slot creation aborted", remote_slot->name))); > > Would this be better split into parts? > > SUGGESTION > errmsg "slot \"%s\" creation aborted" > errdetail "slot was not found on the primary server" > > ~~~ > > 24. > + ereport(WARNING, > + (errmsg("slot \"%s\" invalidated on the primary server," > + " slot creation aborted", remote_slot->name))); > > (similar to previous) > > SUGGESTION > errmsg "slot \"%s\" creation aborted" > errdetail "slot was invalidated on the primary server" > > ~~~ > > 25. > + /* > + * Once we got valid restart_lsn, then confirmed_lsn and catalog_xmin > + * are expected to be valid/non-null. > + */ > > SUGGESTION > Having got a valid restart_lsn, the confirmed_lsn and catalog_xmin are > expected to be valid/non-null. > > ~~~ > > 26. slotsync_drop_initiated_slots > > +/* > + * Drop the slots for which sync is initiated but not yet completed > + * i.e. they are still waiting for the primary server to catch up. > + */ > > I found "waiting for the primary server to catch up" to be difficult > to understand without knowing the full details, but it is not really > described properly until a much larger comment that is buried in the > synchronize_one_slot(). So I think all this needs explanation up-front > in the file, which you can refer to. I have repeated this same review > comment in a couple of places. > I have updated header of file with details and gave reference here and all such similar places. > ~~~ > > 27. get_local_synced_slot_names > > +static List * > +get_local_synced_slot_names(void) > +{ > + List *localSyncedSlots = NIL; > > 27a. > It's not returning a list of "names" though, so is this an appropriate > function name? > > ~~~ > > 27b. > Suggest just call that ('localSyncedSlots') differently. > - In slotsync_drop_initiated_slots() function they are just called 'slots' > - In drop_obsolete_slots() function it is called 'local_slot_list' > > IMO it is better if all these are consistently named -- just all lists > 'slots' or all 'local_slots' or whatever. > > ~~~ > > 28. check_sync_slot_validity > > +static bool > +check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots, > + bool *locally_invalidated) > > Somehow this wording "validity" seems like a misleading function name, > because the return value has nothing to do with the slot field > invalidated. > > The validity/locally_invalidated stuff is a secondary return as a side > effect for the "true" case. > > A more accurate function name would be more like check_sync_slot_on_remote(). > > ~~~ > > 29. check_sync_slot_validity > > +static bool > +check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots, > + bool *locally_invalidated) > +{ > + ListCell *cell; > > There is inconsistent naming -- > > ListCell lc; ListCell cell; ListCell lc_slot; etc.. > > IMO the more complicated names aren't of much value -- probably > everything can be changed to 'lc' for consistency. > > ~~~ > > 30. drop_obsolete_slots > > + /* > + * Get the list of local 'synced' slot so that those not on remote could > + * be dropped. > + */ > > /slot/slots/ > > Also, I don't think it is necessary to say "so that those not on > remote could be dropped." -- That is already described in the function > comment and again in a comment later in the loop. That seems enough. > If the function name get_local_synced_slot_names() is improved a bit > the comment seems redundant because it is obvious from the function > name. > > ~~~ > > 31. > + foreach(lc_slot, local_slot_list) > + { > + ReplicationSlot *local_slot = (ReplicationSlot *) lfirst(lc_slot); > + bool local_exists = false; > + bool locally_invalidated = false; > + > + local_exists = check_sync_slot_validity(local_slot, remote_slot_list, > + &locally_invalidated); > > Shouldn't that 'local_exists' variable be called 'remote_exists'? > That's what the other comments seem to be saying. > > ~~~ > > 32. construct_slot_query > > + appendStringInfo(s, > + "SELECT slot_name, plugin, confirmed_flush_lsn," > + " restart_lsn, catalog_xmin, two_phase, failover," > + " database, pg_get_slot_invalidation_cause(slot_name)" > + " FROM pg_catalog.pg_replication_slots" > + " WHERE failover and sync_state != 'i'"); > > Just wondering if substituting the SYNCSLOT_STATE_INITIATED constant > here might be more appropriate than hardwiring 'i'. Why have a > constant but not use it? > On hold. I could not find quote_* function for a character just like we have 'quote_literal_cstr' for string. Will review. Let me know if you know. > ~~~ > > 33. synchronize_one_slot > > +static void > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot, > + bool *slot_updated) > +{ > + ReplicationSlot *s; > + char sync_state = 0; > > 33a. > It seems strange that the sync_state is initially assigned something > other than the 3 legal values. Should this be defaulting to > SYNCSLOT_STATE_NONE instead? > No, that will change the flow. It should stay uninitialized if the slot is not found. I have changed assignment to '\0' for better clarity. > ~ > > 33b. > I think it is safer to default the *slot_updated = false; because the > code appears to assume it was false already which may or may not be > true. > It is initialized to false in the caller, so we are good here. > ~~~ > > 34. > + /* > + * Make sure that concerned WAL is received before syncing slot to target > + * lsn received from the primary server. > + * > + * This check should never pass as on the primary server, we have waited > + * for the standby's confirmation before updating the logical slot. > + */ > > Maybe this comment should mention up-front that it is just a "Sanity check:" > > ~~~ > > 35. > + /* > + * With hot_standby_feedback enabled and invalidations handled > + * apropriately as above, this should never happen. > + */ > + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) > + { > + ereport(ERROR, > + errmsg("not synchronizing local slot \"%s\" LSN(%X/%X)" > + " to remote slot's LSN(%X/%X) as synchronization " > + " would move it backwards", remote_slot->name, > + LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn), > + LSN_FORMAT_ARGS(remote_slot->restart_lsn))); > + > + goto cleanup; > + } > > 35a. > IIUC then this another comment that should say it is just a "Sanity-check:". > > ~ > > 35b. > I was wondering if there should be Assert(hot_standby_feedback) here > also. The comment "With hot_standby_feedback enabled" is a bit vague > whereas including an Assert will clarify that it must be set. > I think assert is not needed. Slot-sync worker will never start if hot_standby_feedback is disabled. If we put assert here, we need to assert at all other places too where we use other related GUCs like primary_slot_name, conn_info etc. > ~ > > 35c. > Since it says "this should never happen" then it appears elog is more > appropriate than ereport because translations are not needed, right? > > ~ > > 35d. > The ERROR will make that goto cleanup unreachable, won't it? > > ~~~ > > 36. > + /* > + * Already existing slot but not ready (i.e. waiting for the primary > + * server to catch-up), lets attempt to make it sync-ready now. > + */ > > /lets/let's/ > > ~~~ > > 37. > + /* > + * Refer the slot creation part (last 'else' block) for more details > + * on this wait. > + */ > + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn || > + TransactionIdPrecedes(remote_slot->catalog_xmin, > + MyReplicationSlot->data.catalog_xmin)) > + { > + if (!wait_for_primary_slot_catchup(wrconn, remote_slot, NULL)) > + { > + goto cleanup; > + } > + } > > 37a. > Having to jump forward to understand earlier code seems backward. IMO > there should be a big comment atop this module about this subject > which the comment here can just refer to. I will write more about this > topic later (below). > > ~ > > 37b. > The extra code curly braces are not needed. > > ~~~ > > 38. > + ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready " > + "now", remote_slot->name)); > > Better to put the whole errmsg() on a newline instead of splitting the > string like that. > > ~~~ > > 39. > + /* User created slot with the same name exists, raise ERROR. */ > + else if (sync_state == SYNCSLOT_STATE_NONE) > + { > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("skipping sync of slot \"%s\" as it is a user created" > + " slot", remote_slot->name), > + errdetail("This slot has failover enabled on the primary and" > + " thus is sync candidate but user created slot with" > + " the same name already exists on the standby"))); > + } > > I felt it would be better to eliminate this case immediately up-front > when you first searched for the slot names. e.g. code like below. IIUC > this refactor also means the default sync_state can be assigned a > normal value (as I suggested above) instead of the strange assignment > to 0. I feel NULL character ('\0') is better default for local variable sync_slot as we specifically wanted it to be NULL if not assigned. Assigning it to 'SYNCSLOT_STATE_NONE' will be misleading. But moved the 'SYNCSLOT_STATE_NONE' related error though. > > + /* Search for the named slot */ > + if ((s = SearchNamedReplicationSlot(remote_slot->name, true))) > + { > + SpinLockAcquire(&s->mutex); > + sync_state = s->data.sync_state; > + SpinLockRelease(&s->mutex); > > INSERT HERE > + /* User-created slot with the same name exists, raise ERROR. */ > + if (sync_state == SYNCSLOT_STATE_NONE) > + ereport(ERROR, ... > + } > > ~~~ > > 40. > + /* Otherwise create the slot first. */ > + else > + { > > Insert a blank line above that comment for better readability (same as > done for earlier 'else' in this same function) > > ~~~ > > 41. > + ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL, > + remote_slot->two_phase, > + remote_slot->failover, > + SYNCSLOT_STATE_INITIATED); > + > + slot = MyReplicationSlot; > > In hindsight, the prior if/else code blocks in this function also > could have done "slot = MyReplicationSlot;" same as this -- then the > code would be much less verbose. > > ~~~ > > 42. > + SpinLockAcquire(&slot->mutex); > + slot->data.database = get_database_oid(remote_slot->database, false); > + > + namestrcpy(&slot->data.plugin, remote_slot->plugin); > + SpinLockRelease(&slot->mutex); > > IMO the code would be more readable *without* a blank line here > because the mutexed block is more obvious. > > ~~~ > > 43. > + /* > + * If the local restart_lsn and/or local catalog_xmin is ahead of > + * those on the remote then we cannot create the local slot in sync > + * with the primary server because that would mean moving the local > + * slot backwards and we might not have WALs retained for old LSN. In > + * this case we will wait for the primary server's restart_lsn and > + * catalog_xmin to catch up with the local one before attempting the > + * sync. > + */ > > 43a. > This comment describes some fundamental concepts about how this logic > works. I felt this and other comments like this should be at the top > of this slotsync.c file. Then anything that needs to mention about it > can refer to the top comment. For example, I also found other comments > like "... they are still waiting for the primary server to catch up." > to be difficult to understand without knowing these details, but I > think describing core design stuff up-front and saying "refer to the > comment atop the fil" probably would help a lot. > > ~ > > 43b. > Should "wait for the primary server's restart_lsn and..." be "wait for > the primary server slot's restart_lsn and..." ? > > ~~~ > > 44. > + { > + bool persist; > + > + if (!wait_for_primary_slot_catchup(wrconn, remote_slot, &persist)) > + { > + /* > + * The remote slot didn't catch up to locally reserved > + * position. > + * > + * We do not drop the slot because the restart_lsn can be > + * ahead of the current location when recreating the slot in > + * the next cycle. It may take more time to create such a > + * slot. Therefore, we persist it (provided remote-slot is > + * still valid) and attempt the wait and synchronization in > + * the next cycle. > + */ > + if (persist) > + { > + ReplicationSlotPersist(); > + *slot_updated = true; > + } > + > + goto cleanup; > + } > + } > > Looking at the way this 'persist' parameter is used I felt is it too > complicated. IIUC the wait_for_primary_slot_catchup can only return > *persist = true (for a false return) when it has reached/exceeded the > number of retries and still not yet caught up. Why should > wait_for_primary_slot_catchup() pretend to know about persistence? > > In other words, I thought a more meaningful parameter/variable name > (instead of 'persist') is something like 'wait_attempts_exceeded'. IMO > that will make wait_for_primary_slot_catchup() code easier, and here > you can just say like below, where the code matches the comment > better. Thoughts? > > + if (wait_attempts_exceeded) > + { > + ReplicationSlotPersist(); > + *slot_updated = true; > + } > yes, it will make code simpler. Changed it. > ~~~ > > 45. > + > + > + /* > + * Wait for primary is either not needed or is over. Update the lsns > + * and mark the slot as READY for further syncs. > + */ > > Double blank lines? > > ~~~ > > 46. > + ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready " > + "now", remote_slot->name)); > + } > + > +cleanup: > > Better to put the whole errmsg() on a newline instead of splitting the > string like that. > > ~~~ > > 47. synchronize_slots > > +/* > + * Synchronize slots. > + * > + * Gets the failover logical slots info from the primary server and update > + * the slots locally. Creates the slots if not present on the standby. > + * > + * Returns nap time for the next sync-cycle. > + */ > +static long > +synchronize_slots(WalReceiverConn *wrconn) > > /update/updates/ > > ~~~ > > 48. > + /* The primary_slot_name is not set yet or WALs not received yet */ > + SpinLockAcquire(&WalRcv->mutex); > + if (!WalRcv || > + (WalRcv->slotname[0] == '\0') || > + XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) > + { > + SpinLockRelease(&WalRcv->mutex); > + return naptime; > + } > + SpinLockRelease(&WalRcv->mutex); > > Just wondering if the scenario of "WALS not received" is a bit more > like "no activity" so perhaps the naptime returned should be > WORKER_INACTIVITY_NAPTIME_MS here? > This may happen if walreceiver is temporarily having some issue. Longer nap is not recommended here. We should check the state again after a short nap. > > ~~~ > > 49. > + /* Construct query to get slots info from the primary server */ > + initStringInfo(&s); > + construct_slot_query(&s); > > I did not like the construct_slot_query() to be separated from this > function because it makes it too difficult to see if the slot_attr > numbers and column types in this function are correct w.r.t. that > query. IMO better when everything is in the same place where you can > see it all together. e.g. Less risk of breaking something if changes > are made. > > ~~~ > > 50. > + /* Construct the remote_slot tuple and synchronize each slot locally */ > + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple); > > Normally in all the other functions the variable 'slot' was the local > ReplicationSlot but IIUC here represents a remote tuple. Making a > different name would be better like 'remote_slottup' or something > else. > Have changed it to 'tupslot' to keep it short but different from slot. > ~~~ > > 51. > + /* > + * If any of the slots get updated in this sync-cycle, retain default > + * naptime and update 'last_update_time' in slot sync worker. But if no > + * activity is observed in this sync-cycle, then increase naptime provided > + * inactivity time reaches threshold. > + */ > > I think "retain" is a slightly wrong word here because it might have > been WORKER_INACTIVITY_NAPTIME_MS in the previous cycle. > > Maybe just /retain/use/ > > ~~~ > > 52. > +/* > + * Connects primary to validate the slot specified in primary_slot_name. > + * > + * Exits the worker if physical slot with the specified name does not exist. > + */ > +static void > +validate_primary_slot(WalReceiverConn *wrconn) > > There is already a connection, so not sure if this connect should be > saying "connects to"; Maybe is should be saying more like below: > > SUGGESTION > Using the specified primary server connection, validate if the > physical slot identified by GUC primary_slot_name exists. > > Exit the worker if the slot is not found. > > ~~~ > > 53. > + initStringInfo(&cmd); > + appendStringInfo(&cmd, > + "select count(*) = 1 from pg_replication_slots where " > + "slot_type='physical' and slot_name=%s", > + quote_literal_cstr(PrimarySlotName)); > > Write the SQL keywords in uppercase. > > ~~~ > > 54. > + if (res->status != WALRCV_OK_TUPLES) > + ereport(ERROR, > + (errmsg("could not fetch primary_slot_name info from the " > + "primary: %s", res->err))); > > Shouldn't the name of the unfound slot be shown in the ereport, or > will that already appear in the res->err? > > ~~~ > > 55. > + ereport(ERROR, > + errmsg("exiting slots synchronization as slot specified in " > + "primary_slot_name is not valid")); > + > > IMO the format should be the same as I suggested (later) for all the > validate_slotsync_parameters() errors. > > Also, I think the name of the unfound slot needs to be in this message. > > So maybe result is like this: > > SUGGESTION > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: second %s is a GUC variable name */ > errhint("The primary slot \"%s\" specified by %s is not valid.", > slot_name, "primary_slot_name") > ); > > ~~~ > > 56. > +/* > + * Checks if GUCs are set appropriately before starting slot sync worker > + */ > +static void > +validate_slotsync_parameters(char **dbname) > +{ > + /* > + * Since 'enable_syncslot' is ON, check that other GUC settings > + * (primary_slot_name, hot_standby_feedback, wal_level, primary_conninfo) > + * are compatible with slot synchronization. If not, raise ERROR. > + */ > + > > 56a. > I thought that 2nd comment sort of belonged in the function comment. > > ~ > > 56b. > It says "Since 'enable_syncslot' is ON", but I IIUC that is wrong > because the other function slotsync_reread_config() might detect a > change in this GUC and cause this validate_slotsync_parameters() to be > called when enable_syncslot was changed to false. > > In other words, I think you also need to check 'enable_syncslot' and > exit with appropriate ERROR same as all the other config problems. > > OTOH if this is not possible, then the slotsync_reread_config() might > need fixing instead. > 'enable_syncslot' is recently changed to PGC_POSTMASTER from PGC_SIGHUP. Thus 'slotsync_reread_config' also needs to get rid of 'enable_syncslot'. I have changed that now. Slightly changed the comment as well. > ~~~ > > 57. > + /* > + * A physical replication slot(primary_slot_name) is required on the > + * primary to ensure that the rows needed by the standby are not removed > + * after restarting, so that the synchronized slot on the standby will not > + * be invalidated. > + */ > + if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0) > + ereport(ERROR, > + errmsg("exiting slots synchronization as primary_slot_name is " > + "not set")); > + > + /* > + * Hot_standby_feedback must be enabled to cooperate with the physical > + * replication slot, which allows informing the primary about the xmin and > + * catalog_xmin values on the standby. > + */ > + if (!hot_standby_feedback) > + ereport(ERROR, > + errmsg("exiting slots synchronization as hot_standby_feedback " > + "is off")); > + > + /* > + * Logical decoding requires wal_level >= logical and we currently only > + * synchronize logical slots. > + */ > + if (wal_level < WAL_LEVEL_LOGICAL) > + ereport(ERROR, > + errmsg("exiting slots synchronisation as it requires " > + "wal_level >= logical")); > + > + /* > + * The primary_conninfo is required to make connection to primary for > + * getting slots information. > + */ > + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) > + ereport(ERROR, > + errmsg("exiting slots synchronization as primary_conninfo " > + "is not set")); > + > + /* > + * The slot sync worker needs a database connection for walrcv_exec to > + * work. > + */ > + *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); > + if (*dbname == NULL) > + ereport(ERROR, > + errmsg("exiting slots synchronization as dbname is not " > + "specified in primary_conninfo")); > + > +} > > IMO all these errors can be improved by: > - using a common format > - including errhint for the reason > - using the same tone for instructions on what to do (e.g saying must > be set, rather than what was not set) > > SUGGESTION (something like this) > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: %s is a GUC variable name */ > errhint("%s must be defined.", "primary_slot_name") > ); > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: %s is a GUC variable name */ > errhint("%s must be enabled.", "hot_standby_feedback") > ); > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: wal_level is a GUC variable name, 'logical' is a value */ > errhint("wal_level must be >= logical.") > ); > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: %s is a GUC variable name */ > errhint("%s must be defined.", "primary_conninfo") > ); > > ereport(ERROR, > errmsg("exiting from slot synchronization due to bad configuration") > /* translator: 'dbname' is a specific option; %s is a GUC variable name */ > errhint("'dbname' must be specified in %s.", "primary_conninfo") > ); > > ~~~ > > 58. > + *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); > + if (*dbname == NULL) > + ereport(ERROR, > + errmsg("exiting slots synchronization as dbname is not specified in > primary_conninfo")); > + > +} > > Unnecessary blank line at the end of the function > > ~~~ > > 59. > +/* > + * Re-read the config file. > + * > + * If any of the slot sync GUCs changed, validate the values again > + * through validate_slotsync_parameters() which will exit the worker > + * if validaity fails. > + */ > > SUGGESTION > If any of the slot sync GUCs have changed, re-validate them. The > worker will exit if the check fails. > > ~~~ > > 60. > + char *conninfo = pstrdup(PrimaryConnInfo); > + char *slotname = pstrdup(PrimarySlotName); > + bool syncslot = enable_syncslot; > + bool standbyfeedback = hot_standby_feedback; > > For clarity, I would have used var names to match the old GUCs. > > e.g. > /conninfo/old_primary_conninfo/ > /slotname/old_primary_slot_name/ > /syncslot/old_enable_syncslot/ > /standbyfeedback/old_hot_standby_feedback/ > > ~~~ > > 61. > + dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); > + Assert(dbname); > > This code seems premature. IIUC this is only needed to detect that the > dbname was changed. But I think the prerequisite is first that the > conninfoChanged is true. So really this code should be guarded by if > (conninfoChanged) so it can be done later in the function. > Once PrimaryConnInfo is changed, we can not get old-dbname. So it is required to be done before we reach 'conninfoChanged' > ~~~ > > 62. > + if (conninfoChanged || slotnameChanged || > + (syncslot != enable_syncslot) || > + (standbyfeedback != hot_standby_feedback)) > + { > + revalidate = true; > + } > > SUGGESTION > > revalidate = conninfoChanged || slotnameChanged || > (syncslot != enable_syncslot) || > (standbyfeedback != hot_standby_feedback); > > ~~~ > > 63. > + /* > + * Since we have initialized this worker with old dbname, thus exit if > + * dbname changed. Let it get restarted and connect to new dbname > + * specified. > + */ > + if (conninfoChanged && strcmp(dbname, new_dbname) != 0) > + { > + ereport(ERROR, > + errmsg("exiting slot sync woker as dbname in " > + "primary_conninfo changed")); > + } > > 63a. > /old dbname/the old dbname/ > /new dbname/the new dbname/ > /woker/worker/ > > ~ > > 63b. > This code feels awkward. Can't this dbname check and accompanying > ERROR message be moved down into validate_slotsync_parameters(), so it > lives along with all the other GUC validation logic? Maybe you'll need > to change the validate_slotsync_parameters() parameters slightly but I > think it is much better to keep all the validation together. > > ~~~ > > 64. > + > + > +/* > + * Interrupt handler for main loop of slot sync worker. > + */ > +static void > +ProcessSlotSyncInterrupts(WalReceiverConn **wrconn) > > Double blank lines. > > ~~~ > > 65. > + > + > + if (ConfigReloadPending) > + slotsync_reread_config(); > +} > > Double blank lines > > ~~~ > > 66. slotsync_worker_onexit > > +static void > +slotsync_worker_onexit(int code, Datum arg) > +{ > + SpinLockAcquire(&SlotSyncWorker->mutex); > + SlotSyncWorker->pid = 0; > + SpinLockRelease(&SlotSyncWorker->mutex); > +} > > Should assignment use InvalidPid (-1) instead of 0? > > ~~~ > > 67. ReplSlotSyncWorkerMain > > + SpinLockAcquire(&SlotSyncWorker->mutex); > + > + Assert(SlotSyncWorker->pid == 0); > + > + /* Advertise our PID so that the startup process can kill us on promotion */ > + SlotSyncWorker->pid = MyProcPid; > + > + SpinLockRelease(&SlotSyncWorker->mutex); > > Shouldn't pid start as InvalidPid (-1) instead of Assert 0? > > ~~~ > > 68. > + /* Connect to the primary server */ > + wrconn = remote_connect(); > + > + /* > + * Connect to primary and validate the slot specified in > + * primary_slot_name. > + */ > + validate_primary_slot(wrconn); > > Maybe needs some slight rewording in the 2nd comment. "Connect to > primary server" is already said and done in the 1st part. > > ~~~ > > 69. IsSlotSyncWorker > > +/* > + * Is current process the slot sync worker? > + */ > +bool > +IsSlotSyncWorker(void) > +{ > + return SlotSyncWorker->pid == MyProcPid; > +} > > 69a. > For consistency with others like it, I thought this be called > IsLogicalSlotSyncWorker(). > > ~ > > 69b. > For consistency with the others like this, I think the extern should > be declared in logicalworker.h > > ~~~ > > 70. ShutDownSlotSync > > + SpinLockAcquire(&SlotSyncWorker->mutex); > + if (!SlotSyncWorker->pid) > + { > + SpinLockRelease(&SlotSyncWorker->mutex); > + return; > + } > > IMO should be comparing with InvalidPid (-1) here; not 0. > > ~~~ > > 71. > + SpinLockAcquire(&SlotSyncWorker->mutex); > + > + /* Is it gone? */ > + if (!SlotSyncWorker->pid) > + break; > + > + SpinLockRelease(&SlotSyncWorker->mutex); > > Ditto. bad pids should be InvalidPid (-1), not 0. > > ~~~ > > 72. SlotSyncWorkerShmemInit > > + if (!found) > + { > + memset(SlotSyncWorker, 0, size); > + SpinLockInit(&SlotSyncWorker->mutex); > + } > > Probably here the unassigned pid should be set to InvalidPid (-1), not 0. > > ~~~ > > 73. SlotSyncWorkerRegister > > + if (!enable_syncslot) > + { > + ereport(LOG, > + errmsg("skipping slots synchronization as enable_syncslot is " > + "disabled.")); > + return; > + } > > /as/because/ > > ====== > src/backend/replication/logical/tablesync.c > > 74. > #include "commands/copy.h" > +#include "commands/subscriptioncmds.h" > #include "miscadmin.h" > > There were only #include changes but no code changes. Is the #include needed? > > ====== > src/backend/replication/slot.c There is some change in way headers are included. I need to review it in detail. Keeping it on hold. I tried to explain few points on this in [1] (see last comment) [1]: https://www.postgresql.org/message-id/CAJpy0uD6dWUvBgy8MGdugf_Am4pLXTL_vqcwSeHO13v%2BMzc9KA%40mail.gmail.com > > 75. ReplicationSlotCreate > > void > ReplicationSlotCreate(const char *name, bool db_specific, > ReplicationSlotPersistency persistency, > - bool two_phase, bool failover) > + bool two_phase, bool failover, char sync_state) > > The function comment goes to trouble to describe all the parameters > except for 'failover' and 'sync_slate'. I think a failover comment > should be added in patch 0001 and then the sync_state comment should > be added in patch 0002. > > ~~~ > > 76. > + /* > + * Do not allow users to drop the slots which are currently being synced > + * from the primary to the standby. > + */ > + if (user_cmd && RecoveryInProgress() && > + MyReplicationSlot->data.sync_state != SYNCSLOT_STATE_NONE) > + { > + ReplicationSlotRelease(); > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot drop replication slot \"%s\"", name), > + errdetail("This slot is being synced from the primary."))); > + } > > Should the errdetail give the current state? > I feel current info looks good. User can always use pg_replication_slots to monitor sync_state info. > > ====== > src/backend/tcop/postgres.c > > 77. > + else if (IsSlotSyncWorker()) > + { > + ereport(DEBUG1, > + (errmsg_internal("replication slot sync worker is shutting down due > to administrator command"))); > + > + /* > + * Slot sync worker can be stopped at any time. > + * Use exit status 1 so the background worker is restarted. > + */ > + proc_exit(1); > + } > > Explicitly saying "ereport(DEBUG1, errmsg_internal(..." is a bit > overkill; it is simpler to write this as "elog(DEBUG1, ....); > > ====== > src/include/replication/slot.h > > 78. > +/* The possible values for 'sync_state' in ReplicationSlotPersistentData */ > +#define SYNCSLOT_STATE_NONE 'n' /* None for user created slots */ > +#define SYNCSLOT_STATE_INITIATED 'i' /* Sync initiated for the slot but > + * not completed yet, waiting for > + * the primary server to catch-up */ > +#define SYNCSLOT_STATE_READY 'r' /* Initialization complete, ready > + * to be synced further */ > > Already questioned the same elsewhere. IIUC the same tri-state values > of other attributes might be used here too without needing to > introduce 3 new values. > > e.g. > > #define SYNCSLOT_STATE_DISABLED 'd' /* No syncing for this slot */ > #define SYNCSLOT_STATE_PENDING 'p' /* Sync is enabled but we must > wait for the primary server to catch up */ > #define SYNCSLOT_STATE_ENABLED 'e' /* Sync is enabled and the slot is > ready to be synced */ > responded in comment 11. > ~~~ > > 79. > + /* > + * Is this a slot created by a sync-slot worker? > + * > + * Relevant for logical slots on the physical standby. > + */ > + char sync_state; > + > > I assumed that "Relevant for" means "Only relevant for". It should say that. > > If correct, IMO a better field name might be 'standby_sync_state' > sync_state has some value for primary too. It is not null on primary. Thus current name seems a better choice. > ====== > src/test/recovery/t/050_verify_slot_order.pl > > 80. > +$backup_name = 'backup2'; > +$primary->backup($backup_name); > + > +# Create standby3 > +my $standby3 = PostgreSQL::Test::Cluster->new('standby3'); > +$standby3->init_from_backup( > + $primary, $backup_name, > + has_streaming => 1, > + has_restoring => 1); > > The mixture of 'backup2' for 'standby3' seems confusing. Is there a > reason to call it backup2? > > ~~~ > > 81. > +# Verify slot properties on the standby > +is( $standby3->safe_psql('postgres', > + q{SELECT failover, sync_state FROM pg_replication_slots WHERE > slot_name = 'lsub1_slot';} > + ), > + "t|r", > + 'logical slot has sync_state as ready and failover as true on standby'); > > It might be better if the message has the same order as the SQL. Eg. > "failover as true and sync_state as ready". > > ~~~ > > 82. > +# Verify slot properties on the primary > +is( $primary->safe_psql('postgres', > + q{SELECT failover, sync_state FROM pg_replication_slots WHERE > slot_name = 'lsub1_slot';} > + ), > + "t|n", > + 'logical slot has sync_state as none and failover as true on primary'); > + > > It might be better if the message has the same order as the SQL. Eg. > "failover as true and sync_state as none". > > ~~~ > > 83. > +# Test to confirm that restart_lsn of the logical slot on the primary > is synced to the standby > > IMO the major test parts (like this one) may need more highlighting "# > ---------------------" so those comments don't get lost among all the > other comments. > > ~~~ > > 84. > +# let the slots get synced on the standby > +sleep 2; > > Won't this make the test prone to failure on slow machines? Is there > not a more deterministic way to wait for the sync? > > ====== > Kind Regards, > Peter Smith. > Fujitsu Australia
pgsql-hackers by date: