Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | da2d3264-7049-48b1-914a-9c8631c8e384@gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Ajin Cherian <itsajin@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby |
List | pgsql-hackers |
Hi, On 10/24/23 7:44 AM, Ajin Cherian wrote: > On Mon, Oct 23, 2023 at 11:22 PM Drouvot, Bertrand > <bertranddrouvot.pg@gmail.com> wrote: >> >> @@ -602,6 +602,9 @@ CreateDecodingContext(XLogRecPtr start_lsn, >> SnapBuildSetTwoPhaseAt(ctx->snapshot_builder, start_lsn); >> } >> >> + /* set failover in the slot, as requested */ >> + slot->data.failover = ctx->failover; >> + >> >> I think we can get rid of this change in CreateDecodingContext(). >> > Yes, I too noticed this in my testing, however just removing this from > CreateDecodingContext will not allow us to change the slot's failover flag > using Alter subscription. Oh right. > I am thinking of moving this change to > StartLogicalReplication prior to calling CreateDecodingContext by > parsing the command options in StartReplicationCmd > without adding it to the LogicalDecodingContext. > Yeah, that looks like a good place to update "failover". Doing more testing and I have a couple of remarks about he current behavior. 1) Let's imagine that: - there is no standby - standby_slot_names is set to a valid slot on the primary (but due to the above, not linked to any standby) - then a create subscription on a subscriber WITH (failover = true) would start the synchronisation but never finish (means leaving a "synchronisation" slot like "pg_32811_sync_24576_7293415241672430356" in place coming from ReplicationSlotNameForTablesync()). That's expected, but maybe we should emit a warning in WalSndWaitForStandbyConfirmation() on the primary when there is a slot part of standby_slot_names which is not active/does not have an active_pid attached to it? 2) When we create a subscription, another slot is created during the subscription synchronization, namely like "pg_16397_sync_16388_7293447291374081805" (coming from ReplicationSlotNameForTablesync()). This extra slot appears to have failover also set to true. So, If the standby refresh the list of slot to sync when the subscription is still synchronizing we'd see things like on the standby: LOG: waiting for remote slot "mysub" LSN (0/C0034808) and catalog xmin (756) to pass local slot LSN (0/C0034840) and andcatalog xmin (756) LOG: wait over for remote slot "mysub" as its LSN (0/C00368B0) and catalog xmin (756) has now passed local slot LSN (0/C0034840)and catalog xmin (756) LOG: waiting for remote slot "pg_16397_sync_16388_7293447291374081805" LSN (0/C0034808) and catalog xmin (756) to pass localslot LSN (0/C00368E8) and and catalog xmin (756) WARNING: slot "pg_16397_sync_16388_7293447291374081805" disappeared from the primary, aborting slot creation I'm not sure this "pg_16397_sync_16388_7293447291374081805" should have failover set to true. If there is a failover during the subscription creation, better to re-launch the subscription instead? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: