Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot() - Mailing list pgsql-hackers

From Alexey Kondratov
Subject Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()
Date
Msg-id 2cd70350b7ac65dae647d175d62a258c@postgrespro.ru
Whole thread Raw
In response to Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()
List pgsql-hackers
On 2020-06-19 21:57, Fujii Masao wrote:
> On 2020/06/19 23:20, Alexey Kondratov wrote:
>> On 2020-06-19 03:59, Michael Paquier wrote:
>>> On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
>>>> If the WAL segment for the specified restart_lsn (STOP_LSN of the 
>>>> backup)
>>>> exists, then the function will create a physical replication slot 
>>>> and will
>>>> keep all the WAL segments required by the replica to catch up with 
>>>> the
>>>> primary. Otherwise, it returns error, which means that the required 
>>>> WAL
>>>> segments have been already utilised, so we do need to take a new 
>>>> backup.
>>>> Without passing this newly added parameter
>>>> pg_create_physical_replication_slot() works as before.
>>>> 
>>>> What do you think about this?
> 
> Currently pg_create_physical_replication_slot() and 
> CREATE_REPLICATION_SLOT
> replication command seem to be "idential". So if we add new option into 
> one,
> we should add it also into another?
> 

I wonder how it could be used via the replication protocol, but probably 
this option should be added there as well for consistency.

> 
> What happen if future LSN is specified in restart_lsn? With the patch,
> in this case, if the segment at that LSN exists (e.g., because it's 
> recycled
> one), the slot seems to be successfully created. However if the LSN is
> far future and the segment doesn't exist, the creation of slot seems to 
> fail.
> This behavior looks fragile and confusing. We should accept future LSN
> whether its segment currently exists or not?
> 

But what about a possible timeline switch? If we allow specifying it as 
further in the future as one wanted, then appropriate segment with 
specified LSN may be created in the different timeline if it would be 
switched, so it may be misleading. I am not even sure about allowing 
future LSN for existing segments, since PITR / timeline switch may occur 
just after the slot creation, so the pointer may never be valid. Would 
it be better to completely disallow future LSN?

And here I noticed another moment in the patch. TimeLineID of the last 
restart/checkpoint is used to detect whether WAL segment file exists or 
not. It means that if we try to create a slot just after a timeline 
switch, then we could not specify the oldest LSN actually available on 
the disk, since it may be from the previous timeline. One can use LSN 
only within the current timeline. It seems to be fine, but should be 
covered in the docs.

> 
> +    if (!RecoveryInProgress() && !SlotIsLogical(MyReplicationSlot))
> 
> With the patch, the given restart_lsn seems to be ignored during 
> recovery.
> Why?
> 

I have the same question, not sure that this is needed here. It looks 
more like a forgotten copy-paste from ReplicationSlotReserveWal().

>>> 
>>> I think that this was discussed in the past (perhaps one of the
>>> threads related to WAL advancing actually?),
>>> 
>> 
>> I have searched through the archives a bit and found one thread 
>> related to slots advancing [1]. It was dedicated to a problem of 
>> advancing slots which do not reserve WAL yet, if I get it correctly. 
>> Although it is somehow related to the topic, it was a slightly 
>> different issue, IMO.
>> 
>>> 
>>> and this stuff is full of
>>> holes when it comes to think about error handling with checkpoints
>>> running in parallel, potentially doing recycling of segments you 
>>> would
>>> expect to be around based on your input value for restart_lsn *while*
>>> pg_create_physical_replication_slot() is still running and
>>> manipulating the on-disk slot information.
>>> ...
>> 
>> These are the right concerns, but all of them should be applicable to 
>> the pg_create_physical_replication_slot() + immediately_reserve == 
>> true in the same way, doesn't it? I think so, since in that case we 
>> are doing a pretty similar thing — trying to reserve some WAL segment 
>> that may be concurrently deleted.
>> 
>> And this is exactly the reason why ReplicationSlotReserveWal() does it 
>> in several steps in a loop:
>> 
>> 1. Creates a slot with some restart_lsn.
>> 2. Does ReplicationSlotsComputeRequiredLSN() to prevent removal of the 
>> WAL segment with this restart_lsn.
>> 3. Checks that required WAL segment is still there.
>> 4. Repeat if this attempt to prevent WAL removal has failed.
> 
> What happens if concurrent checkpoint decides to remove the segment
> at restart_lsn before #2 and then actually removes it after #3?
> The replication slot is successfully created with the given 
> restart_lsn,
> but the reserved segment has already been removed?
> 

I though about it a bit more and it seems that yes, there is a race even 
for a current pg_create_physical_replication_slot() + 
immediately_reserve == true, i.e. ReplicationSlotReserveWal(). However, 
the chance is very subtle since we take a current GetRedoRecPtr() there. 
Probably one could reproduce it with wal_keep_segments = 1 by holding / 
releasing backend doing the slot creation and checkpointer with gdb, but 
not sure that it is an issue anywhere in the real world.

Maybe I am wrong, but it is not clear for me why current 
ReplicationSlotReserveWal() routine does not have that race. I will try 
to reproduce it though.

Things get worse when we allow specifying an older LSN, since it has a 
higher chances to be at the horizon of deletion by checkpointer. Anyway, 
if I get it correctly, with a current patch slot will be created 
successfully, but will be obsolete and should be invalidated by the next 
checkpoint.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead
Next
From: Jeff Davis
Date:
Subject: Re: Default setting for enable_hashagg_disk