Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uAb7j2ZNVnm_Mvt+ofCvK1Wh17-d-Jm5ZCq=6V0k327xA@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
RE: Synchronizing slots from primary to standby Re: Synchronizing slots from primary to standby RE: Synchronizing slots from primary to standby |
List | pgsql-hackers |
On Wed, Oct 18, 2023 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 17, 2023 at 2:01 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > FYI - the latest patch failed to apply. > > > > > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply > > > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch > > > error: patch failed: src/include/utils/guc_hooks.h:160 > > > error: src/include/utils/guc_hooks.h: patch does not apply > > > > Rebased v24. PFA. > > > > Few comments: > ============== > 1. > + List of physical replication slots that logical replication > with failover > + enabled waits for. > > /logical replication/logical replication slots > > 2. > If > + <varname>enable_syncslot</varname> is not enabled on the > + corresponding standbys, then it may result in indefinite waiting > + on the primary for physical replication slots configured in > + <varname>standby_slot_names</varname> > + </para> > > Why the above leads to indefinite wait? I think we should just ignore > standby_slot_names and probably LOG a message in the server for the > same. > > 3. > +++ b/src/backend/replication/logical/tablesync.c > @@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) > */ > walrcv_create_slot(LogRepWorkerWalRcvConn, > slotname, false /* permanent */ , false /* two_phase */ , > - CRS_USE_SNAPSHOT, origin_startpos); > + false /* enable_failover */ , CRS_USE_SNAPSHOT, > + origin_startpos); > > As per this code, we won't enable failover for tablesync slots. So, > what happens if we need to failover to new node after the tablesync > worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC? > I think we won't be able to continue replication from failed over > node. If this theory is correct, we have two options (a) enable > failover for sync slots as well, if it is enabled for main slot; but > then after we drop the slot on primary once sync is complete, same > needs to be taken care at standby. (b) enable failover even for the > main slot after all tables are in ready state, something similar to > what we do for two_phase. > > 4. > + /* Verify syntax */ > + if (!validate_slot_names(newval, &elemlist)) > + return false; > + > + /* Now verify if these really exist and have correct type */ > + if (!validate_standby_slots(elemlist)) > > These two functions serve quite similar functionality which makes > their naming quite confusing. Can we directly move the functionality > of validate_slot_names() into validate_standby_slots()? > > 5. > +SlotSyncInitConfig(void) > +{ > + char *rawname; > + > + /* Free the old one */ > + list_free(standby_slot_names_list); > + standby_slot_names_list = NIL; > + > + if (strcmp(standby_slot_names, "") != 0) > + { > + rawname = pstrdup(standby_slot_names); > + SplitIdentifierString(rawname, ',', &standby_slot_names_list); > > How does this handle the case where '*' is specified for standby_slot_names? > > > -- > With Regards, > Amit Kapila. PFA v25 patch set. The changes are: 1) 'enable_failover' is changed to 'failover' 2) Alter subscription changes to support 'failover' 3) Fixes a bug in patch001 wherein any change in standby_slot_names was not considered in the flow where logical walsenders wait for standby's confirmation. Now during the wait, if standby_slot_names is changed, wait is restarted using new standby_slot_names. 4) Addresses comments by Bertrand and Amit in [1],[2],[3] The changes are mostly in patch001 and a very few in patch002. Thank You Ajin for working on alter-subscription changes and adding more TAP-tests for 'failover' [1]: https://www.postgresql.org/message-id/2742485f-4118-4fb4-9f94-8150de9e7d7e%40gmail.com [2]: https://www.postgresql.org/message-id/CAA4eK1JcBG6TJ3o5iUd4z0BuTbciLV3dK4aKgb7OgrNGoLcfSQ%40mail.gmail.com [3]: https://www.postgresql.org/message-id/CAA4eK1J6BqO5%3DueFAQO%2BaYyHLaU-oCHrrVFJqHS-i0Ce9aPY2w%40mail.gmail.com thanks Shveta
Attachment
pgsql-hackers by date: