Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAHut+PsuSWjm7U_sVnL8FXZ7ZQcfCcT44kAK7i6qMG35Cwjy3A@mail.gmail.com Whole thread Raw |
In response to | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Hi. Here are some review comments for the patch v39_2-0001. Multiple items from my previous review [1] seemed unanswered, so it wasn't clear if they were discarded because they were wrong or maybe accidently missed. I've repeated all those again here, as well as some new comments. ====== 1. General. Previously (see [1] #0) I asked a question about if there is some documentation missing. Seems not yet answered. ====== Commit message 2. Users can set this flag during CREATE SUBSCRIPTION or during pg_create_logical_replication_slot API. Examples: CREATE SUBSCRIPTION mysub CONNECTION '..' PUBLICATION mypub WITH (failover = true); (failover is the last arg) SELECT * FROM pg_create_logical_replication_slot('myslot', 'pgoutput', false, true, true); ~ I felt it is better to say "Ex1" / "Ex2" (or "1" / "2" or something similar) to indicate better where these examples start and finish, otherwise they just sort of get lost among the text. ====== doc/src/sgml/catalogs.sgml 3. From previous review ([1] #6) I suggested reordering fields. Hous-san wrote: "but I think the functionality of two fields are different and I didn’t find the correlation between two fields except for the type of value." Yes, that is true. OTOH, I felt grouping the attributes by the same types made the docs easier to read. ====== src/backend/commands/subscriptioncmds.c 4. CreateSubscription + /* + * If only the slot_name is specified (without create_slot option), + * it is possible that the user intends to use an existing slot on + * the publisher, so here we enable failover for the slot if + * requested. + */ + else if (opts.slot_name && failover_enabled) + { Unanswered question from previous review (see [1] #11a). i.e. How does this condition ensure that *only* the slot name was specified (like the comment is saying)? ~~~ 5. AlterSubscription errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when two_phase is enabled"), errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION."))); + if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED && opts.copy_data) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed when failover is enabled"), + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION."))); + There are translations issues same as reported in my previous review (see [1] #12b and also several other places as noted in [1]). Hou-san replied that I "can start a separate thread to change the twophase related messages, and we can change accordingly if it's accepted.", but that's not right IMO because it is only the fact that this sysncslot patch is reusing a similar message that warrants the need to extract a "common" message part in the first place. So I think it is responsibility if this sycslot patch to make this change. ====== src/backend/replication/logical/tablesync.c 6. process_syncing_tables_for_apply + if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING) + ereport(LOG, + (errmsg("logical replication apply worker for subscription \"%s\" will restart so that two_phase can be enabled", + MySubscription->name))); + + if (MySubscription->failoverstate == LOGICALREP_FAILOVER_STATE_PENDING) + ereport(LOG, + (errmsg("logical replication apply worker for subscription \"%s\" will restart so that failover can be enabled", + MySubscription->name))); 6a. You may end up log 2 restart messages for the same restart. Is it OK? ~ 6b. This is another example where you should share the same common message (for less translations) ====== src/backend/replication/logical/worker.c 7. + * The logical slot on the primary can be synced to the standby by specifying + * the failover = true when creating the subscription. Enabling failover allows + * us to smoothly transition to the standby in case the primary gets promoted, + * ensuring that we can subscribe to the new primary without losing any data. /the failover = true/the failover = true option/ or /the failover = true/failover = true/ ~~~ 8. + #include "postgres.h" Unnecessary extra blank line ====== src/backend/replication/slot.c 9. validate_standby_slots There was no reply to the comment in my previous review (see [1] #27). Maybe you disagree or maybe accidentally overlooked? ~~~ 10. check_standby_slot_names In previous review I asked ([1] #28) why a special check was needed for "*". Hou-san replied that "SplitIdentifierString() does not give error for '*' and '*' can be considered as valid value which if accepted can mislead user". Sure, but won't the code then just try to find if there is a replication slot called "*" and that will fail. That was my point, if the slot name lookup is going to fail anyway then why have the extra code for the special "*" case up-front? Note -- I haven't tried it, so maybe code doesn't work like I think it does. ====== src/backend/replication/walsender.c 11. PhysicalWakeupLogicalWalSnd No reply to my previous review comment ([1] #33). Not done? Disagreed, or accidentally missed? ~~~ 12. WalSndFilterStandbySlots + /* + * If logical slot name is given in standby_slot_names, give WARNING + * and skip it. Since it is harmless, so WARNING should be enough, no + * need to error-out. + */ + else if (SlotIsLogical(slot)) + warningfmt = _("cannot have logical replication slot \"%s\" in parameter \"%s\", ignoring"); I previously raised an issue (see [1] #35) thinking this could not happen. Hou-san explained how it might happen ("user could drop the logical slot and recreate a physical slot with the same name without changing the GUC.") so this code was necessary. That is OK, but I think your same explanation in the code commen. ~~~ 13. WalSndFilterStandbySlots + standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc); I previously raised issue (see [1] #36). Hou-san replied "I think it's OK to remove slots if it's invalidated, dropped, or was changed to logical one as we don't need to wait for these slots to catch up anymore." Sure, maybe code is fine, but my point was that the code is removing elements *more* scenarios than are mentioned by the function comment, so maybe update that function comment for all the removal scenarios. ~~~ 14. WalSndWaitForStandbyConfirmation The comment change from my previous review ([1] #37) not done. Disagreed, or accidentally missed? ~~~ 15. WalSndWaitForStandbyConfirmation The question about calling ConditionVariablePrepareToSleep in my previous review ([1] #39) not answered. Accidentally missed? ~~~ 16. WalSndWaitForWal if (RecentFlushPtr != InvalidXLogRecPtr && loc <= RecentFlushPtr) - return RecentFlushPtr; + { + WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots); It is better to use XLogRecPtrIsInvalid macro here. I know it was not strictly added by your patch, but so much else changed nearby so I thought this should be fixed at the same time. ====== src/bin/pg_upgrade/info.c 17. get_old_cluster_logical_slot_infos + slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots); Excessive whitespace. ====== [1] My previous review of v35-0001. https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD55bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: