Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | ZgayTFIhLfzhpHci@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
Hi, On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote: > > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote: > > > > > Attach a new version patch which fixed an un-initialized variable > > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > > > > > > > > > > Thanks! > > > > > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > > > + { > > > > + /* > > > > + * By advancing the restart_lsn, confirmed_lsn, and xmin using > > > > + * fast-forward logical decoding, we ensure that the required > > > > snapshots > > > > + * are saved to disk. This enables logical decoding to quickly > > > > reach a > > > > + * consistent point at the restart_lsn, eliminating the risk of > > > > missing > > > > + * data during snapshot creation. > > > > + */ > > > > + > > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > > > + > > > > found_consistent_point); > > > > + ReplicationSlotsComputeRequiredLSN(); > > > > + updated_lsn = true; > > > > + } > > > > > > > > Instead of using pg_logical_replication_slot_advance() for each synced slot and > > > > during sync cycles what about?: > > > > > > > > - keep sync slot synchronization as it is currently (not using > > > > pg_logical_replication_slot_advance()) > > > > - create "an hidden" logical slot if sync slot feature is on > > > > - at the time of promotion use pg_logical_replication_slot_advance() on this > > > > hidden slot only to advance to the max lsn of the synced slots > > > > > > > > I'm not sure that would be enough, just asking your thoughts on this (benefits > > > > would be to avoid calling pg_logical_replication_slot_advance() on each sync > > > > slots and during the sync cycles). > > > > > > Thanks for the idea ! > > > > > > I considered about this. I think advancing the "hidden" slot on promotion may be a > > > bit late, because if we cannot reach the consistent point after advancing the > > > "hidden" slot, then it means we may need to remove all the synced slots as we > > > are not sure if they are usable(will not loss data) after promotion. > > > > What about advancing the hidden slot during the sync cycles then? > > > > > The current approach is to mark such un-consistent slot as temp and persist > > > them once it reaches consistent point, so that user can ensure the slot can be > > > used after promotion once persisted. > > > > Right, but do we need to do so for all the sync slots? Would a single hidden > > slot be enough? > > > > Even if we mark one of the synced slots as persistent without reaching > a consistent state, it could create a problem after promotion. And, > how a single hidden slot would serve the purpose, different synced > slots will have different restart/confirmed_flush LSN and we won't be > able to perform advancing for those using a single slot. For example, > say for first synced slot, it has not reached a consistent state and > then how can it try for the second slot? This sounds quite tricky to > make work. We should go with something simple where the chances of > introducing bugs are lesser. Yeah, better to go with something simple. + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) + { + /* + * By advancing the restart_lsn, confirmed_lsn, and xmin using + * fast-forward logical decoding, we ensure that the required snapshots + * are saved to disk. This enables logical decoding to quickly reach a + * consistent point at the restart_lsn, eliminating the risk of missing + * data during snapshot creation. + */ + pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, + found_consistent_point); In our case, what about skipping WaitForStandbyConfirmation() in pg_logical_replication_slot_advance()? (It could go until the RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: