Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1KYAjkRRSV-NAfDq=4GyHpd2igs_8se0uW-LNEF2RbaRA@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Tue, Feb 6, 2024 at 3:57 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Feb 6, 2024 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 6, 2024 at 3:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > --- > > > > > > Since Two processes (e.g. the slotsync worker and > > > > > > pg_sync_replication_slots()) concurrently fetch and update the slot > > > > > > information, there is a race condition where slot's > > > > > > confirmed_flush_lsn goes backward. > > > > > > > > > > > > > > > > Right, this is possible, though there shouldn't be a problem because > > > > > anyway, slotsync is an async process. Till we hold restart_lsn, the > > > > > required WAL won't be removed. Having said that, I can think of two > > > > > ways to avoid it: (a) We can have some flag in shared memory using > > > > > which we can detect whether any other process is doing slot > > > > > syncronization and then either error out at that time or simply wait > > > > > or may take nowait kind of parameter from user to decide what to do? > > > > > If this is feasible, we can simply error out for the first version and > > > > > extend it later if we see any use cases for the same (b) similar to > > > > > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an > > > > > error, this is good for now but in future we may still have another > > > > > similar issue, so I would prefer (a) among these but I am fine if you > > > > > prefer (b) or have some other ideas like just note down in comments > > > > > that this is a harmless case and can happen only very rarely. > > > > > > > > Thank you for sharing the ideas. I would prefer (a). For (b), the same > > > > issue still happens for other fields. > > > > > > I agree that (a) looks better. On a separate note, while looking at > > > this API pg_sync_replication_slots(PG_FUNCTION_ARGS) shouldn't there > > > be an optional parameter to give one slot or multiple slots or all > > > slots as default, that will give better control to the user no? > > > > > > > As of now, we want to give functionality similar to slotsync worker > > with a difference that users can use this new function for planned > > switchovers. So, syncing all failover slots by default. I think if > > there is a use case to selectively sync some of the failover slots > > then we can probably extend this function and slotsync worker as well. > > Normally, if the primary goes down due to whatever reason users would > > want to restart the replication for all the defined publications via > > existing failover slots. Why would anyone want to do it partially? > > If we consider the usability of such a function (I mean as it is > implemented now, without any argument) one use case could be that if > the slot sync worker is not keeping up or at some point in time the > user doesn't want to wait for the worker to do this instead user can > do it by himself. > Possibly, but I was imagining that it would be used for planned switchover cases and also for testing the core sync slot functionality in our TAP tests. > So now if we have such a functionality then it would be even better to > extend it to selectively sync the slot. For example, if there is some > issue in syncing all slots, maybe some bug or taking a long time to > sync because there are a lot of slots but if the user needs to quickly > failover and he/she is interested in only a couple of slots then such > a option could be helpful. no? > I see your point but not sure how useful it is in the field. I am fine if others also think such a parameter will be useful and anyway I think we can even extend it after v1 is done. -- With Regards, Amit Kapila.
pgsql-hackers by date: