Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id afe4ab6c-dde3-48ea-acd8-6f6052c7b8fd@gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
Hi,

On 10/27/23 11:56 AM, shveta malik wrote:
> On Wed, Oct 25, 2023 at 3:15 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> Hi,
>>
>> On 10/25/23 5:00 AM, shveta malik wrote:
>>> On Tue, Oct 24, 2023 at 11:54 AM Drouvot, Bertrand
>>> <bertranddrouvot.pg@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 10/23/23 2:56 PM, shveta malik wrote:
>>>>> On Mon, Oct 23, 2023 at 5:52 PM Drouvot, Bertrand
>>>>> <bertranddrouvot.pg@gmail.com> wrote:
>>>>
>>>>>> We are waiting for DEFAULT_NAPTIME_PER_CYCLE (3 minutes) before checking if there
>>>>>> is new synced slot(s) to be created on the standby. Do we want to keep this behavior
>>>>>> for V1?
>>>>>>
>>>>>
>>>>> I think for the slotsync workers case, we should reduce the naptime in
>>>>> the launcher to say 30sec and retain the default one of 3mins for
>>>>> subscription apply workers. Thoughts?
>>>>>
>>>>
>>>> Another option could be to keep DEFAULT_NAPTIME_PER_CYCLE and create a new
>>>> API on the standby that would refresh the list of sync slot at wish, thoughts?
>>>>
>>>
>>> Do you mean API to refresh list of DBIDs rather than sync-slots?
>>> As per current design, launcher gets DBID lists for all the failover
>>> slots from the primary at intervals of DEFAULT_NAPTIME_PER_CYCLE.
>>
>> I mean an API to get a newly created slot on the primary being created/synced on
>> the standby at wish.
>>
>> Also let's imagine this scenario:
>>
>> - create logical_slot1 on the primary (and don't start using it)
>>
>> Then on the standby we'll get things like:
>>
>> 2023-10-25 08:33:36.897 UTC [740298] LOG:  waiting for remote slot "logical_slot1" LSN (0/C00316A0) and catalog xmin
(752)to pass local slot LSN (0/C0049530) and and catalog xmin (754)
 
>>
>> That's expected and due to the fact that ReplicationSlotReserveWal() does set the slot
>> restart_lsn to a value < at the corresponding restart_lsn slot on the primary.
>>
>> - create logical_slot2 on the primary (and start using it)
>>
>> Then logical_slot2 won't be created/synced on the standby until there is activity on logical_slot1 on the primary
>> that would produce things like:
>> 2023-10-25 08:41:35.508 UTC [740298] LOG:  wait over for remote slot "logical_slot1" as its LSN (0/C005FFD8) and
catalogxmin (756) has now passed local slot LSN (0/C0049530) and catalog xmin (754)
 
> 
> 
> Slight correction to above. As soon as we start activity on
> logical_slot2, it will impact all the slots on primary, as the WALs
> are consumed by all the slots. So even if there is activity on
> logical_slot2, logical_slot1 creation on standby will be unblocked and
> it will then move to logical_slot2 creation. eg:
> 
> --on standby:
> 2023-10-27 15:15:46.069 IST [696884] LOG:  waiting for remote slot
> "mysubnew1_1" LSN (0/3C97970) and catalog xmin (756) to pass local
> slot LSN (0/3C979A8) and and catalog xmin (756)
> 
> on primary:
> newdb1=# select now();
>                 now
> ----------------------------------
>   2023-10-27 15:15:51.504835+05:30
> (1 row)
> 
> --activity on mysubnew1_3
> newdb1=# insert into tab1_3 values(1);
> INSERT 0 1
> newdb1=# select now();
>                 now
> ----------------------------------
>   2023-10-27 15:15:54.651406+05:30
> 
> 
> --on standby, mysubnew1_1 is unblocked.
> 2023-10-27 15:15:56.223 IST [696884] LOG:  wait over for remote slot
> "mysubnew1_1" as its LSN (0/3C97A18) and catalog xmin (757) has now
> passed local slot LSN (0/3C979A8) and catalog xmin (756)
> 
> My Setup:
> mysubnew1_1 -->mypubnew1_1 -->tab1_1
> mysubnew1_3 -->mypubnew1_3-->tab1_3
> 

Agree with your test case, but in my case I was not using pub/sub.

I was not clear, so when I said:

>> - create logical_slot1 on the primary (and don't start using it)

I meant don't start decoding from it (like using pg_recvlogical() or
pg_logical_slot_get_changes()).

By using pub/sub the "don't start using it" is not satisfied.

My test case is:

"
SELECT * FROM pg_create_logical_replication_slot('logical_slot1', 'test_decoding', false, true, true);
SELECT * FROM pg_create_logical_replication_slot('logical_slot2', 'test_decoding', false, true, true);
pg_recvlogical -d postgres -S logical_slot2 --no-loop --start -f -
"

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: pg_stat_statements and "IN" conditions
Next
From: Tom Lane
Date:
Subject: Re: Enderbury Island disappeared from timezone database