Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1J6BqO5=ueFAQO+aYyHLaU-oCHrrVFJqHS-i0Ce9aPY2w@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby |
List | pgsql-hackers |
On Tue, Oct 17, 2023 at 2:01 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > FYI - the latest patch failed to apply. > > > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply > > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch > > error: patch failed: src/include/utils/guc_hooks.h:160 > > error: src/include/utils/guc_hooks.h: patch does not apply > > Rebased v24. PFA. > Few comments: ============== 1. + List of physical replication slots that logical replication with failover + enabled waits for. /logical replication/logical replication slots 2. If + <varname>enable_syncslot</varname> is not enabled on the + corresponding standbys, then it may result in indefinite waiting + on the primary for physical replication slots configured in + <varname>standby_slot_names</varname> + </para> Why the above leads to indefinite wait? I think we should just ignore standby_slot_names and probably LOG a message in the server for the same. 3. +++ b/src/backend/replication/logical/tablesync.c @@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) */ walrcv_create_slot(LogRepWorkerWalRcvConn, slotname, false /* permanent */ , false /* two_phase */ , - CRS_USE_SNAPSHOT, origin_startpos); + false /* enable_failover */ , CRS_USE_SNAPSHOT, + origin_startpos); As per this code, we won't enable failover for tablesync slots. So, what happens if we need to failover to new node after the tablesync worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC? I think we won't be able to continue replication from failed over node. If this theory is correct, we have two options (a) enable failover for sync slots as well, if it is enabled for main slot; but then after we drop the slot on primary once sync is complete, same needs to be taken care at standby. (b) enable failover even for the main slot after all tables are in ready state, something similar to what we do for two_phase. 4. + /* Verify syntax */ + if (!validate_slot_names(newval, &elemlist)) + return false; + + /* Now verify if these really exist and have correct type */ + if (!validate_standby_slots(elemlist)) These two functions serve quite similar functionality which makes their naming quite confusing. Can we directly move the functionality of validate_slot_names() into validate_standby_slots()? 5. +SlotSyncInitConfig(void) +{ + char *rawname; + + /* Free the old one */ + list_free(standby_slot_names_list); + standby_slot_names_list = NIL; + + if (strcmp(standby_slot_names, "") != 0) + { + rawname = pstrdup(standby_slot_names); + SplitIdentifierString(rawname, ',', &standby_slot_names_list); How does this handle the case where '*' is specified for standby_slot_names? -- With Regards, Amit Kapila.
pgsql-hackers by date: