Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAA4eK1KNRpQVxALa-h17XNnF2y5Ew=Ga=gTVZpr+CJa-o+xg-A@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Fri, Feb 2, 2024 at 6:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Feb 1, 2024 at 12:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > BTW I've tested the following switch/fail-back scenario but it seems > > > not to work fine. Am I missing something? > > > > > > Setup: > > > node1 is the primary, node2 is the physical standby for node1, and > > > node3 is the subscriber connecting to node1. > > > > > > Steps: > > > 1. [node1]: create a table and a publication for the table. > > > 2. [node2]: set enable_syncslot = on and start (to receive WALs from node1). > > > 3. [node3]: create a subscription with failover = true for the publication. > > > 4. [node2]: promote to the new standby. > > > 5. [node3]: alter subscription to connect the new primary, node2. > > > 6. [node1]: stop, set enable_syncslot = on (and other required > > > parameters), then start as a new standby. > > > > > > Then I got the error "exiting from slot synchronization because same > > > name slot "test_sub" already exists on the standby". > > > > > > The logical replication slot that was created on the old primary > > > (node1) has been synchronized to the old standby (node2). Therefore on > > > node2, the slot's "synced" field is true. However, once node1 starts > > > as the new standby with slot synchronization, the slotsync worker > > > cannot synchronize the slot because the slot's "synced" field on the > > > primary is false. > > > > > > > Yeah, we avoided doing anything in this case because the user could > > have manually created another slot with the same name on standby. > > Unlike WAL slots can be modified on standby as we allow decoding on > > standby, so we can't allow to overwrite the existing slots. We won't > > be able to distinguish whether the existing slot was a slot that the > > user wants to sync with primary or a slot created on standby to > > perform decoding. I think in this case user first needs to drop the > > slot on new standby. > > Yes, but if we do a switch-back further (i.e. in above case, node1 > backs to the primary again and node becomes the standby again), the > user doesn't need to remove failover slots since they are already > marked as "synced". But, I think in this case node-2's timeline will be ahead of node-1, so will we be able to make node-2 follow node-1 again without any additional steps? One thing is not clear to me after promotion the timeline changes in WAL, so the locations in slots will be as per new timelines, after that will it be safe to sync slots from the new primary to old-primary? In general, I think after failover, we recommend running pg_rewind if the old primary has to follow the new primary to account for divergence in WAL. So, not sure we can safely start syncing slots in old-primary from new-primary, consider that in the new primary, the same name slot may have dropped/re-created multiple times. We can probably reset all the fields of the existing slot the first time syncing for an existing slot or do something like that but I think it would be better to just re-create the slot. > I wonder if we could do something automatically to > reduce the user's operation. One possibility is that we forcefully drop/re-create the slot or directly overwrite the slot contents but that would probably be better done via some GUC or slot-level parameter. I feel we should leave this for another day, for the first version, we can document that an error will occur if the same name slots on standby exist, so users need to ensure that there shouldn't be an existing same name slots on standby before sync. -- With Regards, Amit Kapila.
pgsql-hackers by date: