Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id 773f7e94-13c0-4aaf-967d-1f005e7d48d5@gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
Hi,

On 10/31/23 10:37 AM, shveta malik wrote:
> On Fri, Oct 27, 2023 at 8:43 PM Drouvot, Bertrand
>>
>> Agree with your test case, but in my case I was not using pub/sub.
>>
>> I was not clear, so when I said:
>>
>>>> - create logical_slot1 on the primary (and don't start using it)
>>
>> I meant don't start decoding from it (like using pg_recvlogical() or
>> pg_logical_slot_get_changes()).
>>
>> By using pub/sub the "don't start using it" is not satisfied.
>>
>> My test case is:
>>
>> "
>> SELECT * FROM pg_create_logical_replication_slot('logical_slot1', 'test_decoding', false, true, true);
>> SELECT * FROM pg_create_logical_replication_slot('logical_slot2', 'test_decoding', false, true, true);
>> pg_recvlogical -d postgres -S logical_slot2 --no-loop --start -f -
>> "
>>
> 
> Okay, I am able to reproduce it now. Thanks for clarification. I have
> tried to change the algorithm as per suggestion by Amit in [1]
> 
> [1]:
https://www.postgresql.org/message-id/CAA4eK1KBL0110gamQfc62X%3D5JV8-Qjd0dw0Mq0o07cq6kE%2Bq%3Dg%40mail.gmail.com

Thanks!

> 
> This is not full proof solution but optimization over first one. Now
> in any sync-cycle, we take 2 attempts for slots-creation (if any slots
> are available to be created). In first attempt, we do not wait
> indefinitely on inactive slots, we wait only for a fixed amount of
> time and if remote-slot is still behind, then we add that to the
> pending list and move to the next slot. Once we are done with first
> attempt, in second attempt, we go for the pending ones and now we wait
> on each of them until the primary catches up.

Aren't we "just" postponing the "issue"? I mean if there is really no activity
on, say, the first created slot, then once we move to the second attempt then any newly
created slot from that time would wait to be synced forever, no?

Looking at V30:

+       /* Update lsns of slot to remote slot's current position */
+       local_slot_update(remote_slot);
+       ReplicationSlotPersist();
+
+       ereport(LOG, errmsg("created slot \"%s\" locally", remote_slot->name));

I think this message is confusing as the slot has been created before it, here:

+   else
+   {
+       TransactionId xmin_horizon = InvalidTransactionId;
+       ReplicationSlot *slot;
+
+       ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
+                             remote_slot->two_phase, false);

So that it shows up in pg_replication_slots before this message is emitted (and that
specially true/worst for non active slots).

Maybe something like "newly locally created slot XXX has been synced..."?

While at it, would that make sense to move

+       slot->data.failover = true;

once we stop waiting for this slot? I think that would avoid confusion if one
query pg_replication_slots while we are still waiting for this slot to be synced,
thoughts? (currently we can see pg_replication_slots.synced_slot set to true
while we are still waiting).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: RFC: Pluggable TOAST
Next
From: Amit Kapila
Date:
Subject: Re: A recent message added to pg_upgade