Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | 773f7e94-13c0-4aaf-967d-1f005e7d48d5@gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Hi, On 10/31/23 10:37 AM, shveta malik wrote: > On Fri, Oct 27, 2023 at 8:43 PM Drouvot, Bertrand >> >> Agree with your test case, but in my case I was not using pub/sub. >> >> I was not clear, so when I said: >> >>>> - create logical_slot1 on the primary (and don't start using it) >> >> I meant don't start decoding from it (like using pg_recvlogical() or >> pg_logical_slot_get_changes()). >> >> By using pub/sub the "don't start using it" is not satisfied. >> >> My test case is: >> >> " >> SELECT * FROM pg_create_logical_replication_slot('logical_slot1', 'test_decoding', false, true, true); >> SELECT * FROM pg_create_logical_replication_slot('logical_slot2', 'test_decoding', false, true, true); >> pg_recvlogical -d postgres -S logical_slot2 --no-loop --start -f - >> " >> > > Okay, I am able to reproduce it now. Thanks for clarification. I have > tried to change the algorithm as per suggestion by Amit in [1] > > [1]: https://www.postgresql.org/message-id/CAA4eK1KBL0110gamQfc62X%3D5JV8-Qjd0dw0Mq0o07cq6kE%2Bq%3Dg%40mail.gmail.com Thanks! > > This is not full proof solution but optimization over first one. Now > in any sync-cycle, we take 2 attempts for slots-creation (if any slots > are available to be created). In first attempt, we do not wait > indefinitely on inactive slots, we wait only for a fixed amount of > time and if remote-slot is still behind, then we add that to the > pending list and move to the next slot. Once we are done with first > attempt, in second attempt, we go for the pending ones and now we wait > on each of them until the primary catches up. Aren't we "just" postponing the "issue"? I mean if there is really no activity on, say, the first created slot, then once we move to the second attempt then any newly created slot from that time would wait to be synced forever, no? Looking at V30: + /* Update lsns of slot to remote slot's current position */ + local_slot_update(remote_slot); + ReplicationSlotPersist(); + + ereport(LOG, errmsg("created slot \"%s\" locally", remote_slot->name)); I think this message is confusing as the slot has been created before it, here: + else + { + TransactionId xmin_horizon = InvalidTransactionId; + ReplicationSlot *slot; + + ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL, + remote_slot->two_phase, false); So that it shows up in pg_replication_slots before this message is emitted (and that specially true/worst for non active slots). Maybe something like "newly locally created slot XXX has been synced..."? While at it, would that make sense to move + slot->data.failover = true; once we stop waiting for this slot? I think that would avoid confusion if one query pg_replication_slots while we are still waiting for this slot to be synced, thoughts? (currently we can see pg_replication_slots.synced_slot set to true while we are still waiting). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: