Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1+3tCskiT3ma5NCdqBX+_BaJ1sRyMFrJB-ObN7HHe_8jg@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > --- > Since Two processes (e.g. the slotsync worker and > pg_sync_replication_slots()) concurrently fetch and update the slot > information, there is a race condition where slot's > confirmed_flush_lsn goes backward. > Right, this is possible, though there shouldn't be a problem because anyway, slotsync is an async process. Till we hold restart_lsn, the required WAL won't be removed. Having said that, I can think of two ways to avoid it: (a) We can have some flag in shared memory using which we can detect whether any other process is doing slot syncronization and then either error out at that time or simply wait or may take nowait kind of parameter from user to decide what to do? If this is feasible, we can simply error out for the first version and extend it later if we see any use cases for the same (b) similar to restart_lsn, if confirmed_flush_lsn is getting moved back, raise an error, this is good for now but in future we may still have another similar issue, so I would prefer (a) among these but I am fine if you prefer (b) or have some other ideas like just note down in comments that this is a harmless case and can happen only very rarely. > > --- > + It is recommended that subscriptions are first disabled before promoting > f+ the standby and are enabled back after altering the connection string. > > I think it's better to describe the reason why it's recommended to > disable subscriptions before the standby promotion. > Agreed. The reason I see for this is that if we don't disable the subscription before promotion and changing the connection string there is a chance that the old primary comes back and the subscriber can have some additional data, though the chances of same are less. > --- > +/* Slot sync worker objects */ > +extern PGDLLIMPORT char *PrimaryConnInfo; > +extern PGDLLIMPORT char *PrimarySlotName; > > These two variables are declared also in xlogrecovery.h. Is it > intentional? If so, I think it's better to write comments. > > --- > Global functions and variables used by the slotsync worker are > declared in logicalworker.h and worker_internal.h. But is it really > okay to make a dependency between the slotsync worker and logical > replication workers? IIUC the slotsync worker is conceptually a > separate feature from the logical replication. I think the slotsync > worker can have its own header file. > +1. > > --- > + Confirm that the standby server is not lagging behind the subscribers. > + This step can be skipped if > + <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link> > + has been correctly configured. > > How can the user confirm if standby_slot_names is correctly configured? > I think users can refer to LOGs to see if it has changed since the first time it was configured. I tried by existing parameter and see the following in LOG: LOG: received SIGHUP, reloading configuration files 2024-02-06 11:38:59.069 IST [9240] LOG: parameter "autovacuum" changed to "on" If the user can't confirm then it is better to follow the steps mentioned in the patch. Do you want something else to be written in docs for this? If so, what? -- With Regards, Amit Kapila.
pgsql-hackers by date: