Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | ZcHX4SXkqtGe27a6@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Hi, On Tue, Feb 06, 2024 at 03:19:11AM +0000, Zhijie Hou (Fujitsu) wrote: > On Friday, February 2, 2024 2:03 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote: > > > Attached v75 patch-set. Changes are: > > > > > > 1) Re-arranged the patches: > > > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are > > > separated out in v75-001 as those are independent changes. > > > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special > > > process' and 'App-name changes' are now merged to single patch which > > > makes v75-002. > > > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation > > > Document' patches are maintained as is (v75-003 and v75-004 now). > > > > Thanks! > > > > I only looked at the commit message for v75-0002 and see that it has changed > > since the comment done in [1], but it still does not look correct to me. > > > > " > > If a logical slot on the primary is valid but is invalidated on the standby, then > > that slot is dropped and recreated on the standby in next sync-cycle provided > > the slot still exists on the primary server. It is okay to recreate such slots as long > > as these are not consumable on the standby (which is the case currently). This > > situation may occur due to the following reasons: > > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL > > records from the restart_lsn of the slot. > > - primary_slot_name is temporarily reset to null and the physical slot is > > removed. > > - The primary changes wal_level to a level lower than logical. > > " > > > > If a logical decoding slot "still exists on the primary server" then the primary > > can not change the wal_level to lower than logical, one would get something > > like: > > > > "FATAL: logical replication slot "logical_slot" exists, but wal_level < logical" > > > > and then slots won't get invalidated on the standby. I've the feeling that the > > wal_level conflict part may need to be explained separately? (I think it's not > > possible that they end up being re-created on the standby for this conflict, > > they will be simply removed as it would mean the counterpart one on the > > primary does not exist anymore). > > This is possible in some extreme cases, because the slot is synced > asynchronously. > > For example: If on the primary the wal_level is changed to 'replica' It means that all the logical slots have been dropped on the primary (if not, it's not possible to change it to a level < logical). > and then > changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE > wals. And before the standby replay these wals, user can create a failover slot And now it is re-created. So the slot has been dropped and recreated on the primary, to it's kind of expected it is also dropped and re-created on the standby (should it be invalidated or not). > Although I think it doesn't seem a real world case, so I am not sure is it worth > separate explanation. Yeah, I don't think your example is worth a separate explanation also because it's expected to see the slot being dropped / re-created anyway (see above). That said, I still think the commit message needs some re-wording, what about? ===== If a logical slot on the primary is valid but is invalidated on the standby, then that slot is dropped and can be recreated on the standby in next pg_sync_replication_slots() call provided the slot still exists on the primary server. It is okay to recreate such slots as long as these are not consumable on the standby (which is the case currently). This situation may occur due to the following reasons: - The max_slot_wal_keep_size on the standby is insufficient to retain WAL records from the restart_lsn of the slot. - primary_slot_name is temporarily reset to null and the physical slot is removed. Changing the primary wal_level to a level lower than logical is only possible if the logical slots are removed on the primary, so it's expected to see the slots being removed on the standby too (and re-created if they are re-created on the primary). ===== Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: