Thread: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

[PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

From
Vyacheslav Makarov
Date:
Hello, hackers.

I would like to propose a patch, which allows passing one extra 
parameter to pg_create_physical_replication_slot() — restart_lsn. It 
could be very helpful if we already have some backup with STOP_LSN from 
a couple of hours in the past and we want to quickly verify wether it is 
possible to create a replica from this backup or not.

If the WAL segment for the specified restart_lsn (STOP_LSN of the 
backup) exists, then the function will create a physical replication 
slot and will keep all the WAL segments required by the replica to catch 
up with the primary. Otherwise, it returns error, which means that the 
required WAL segments have been already utilised, so we do need to take 
a new backup. Without passing this newly added parameter 
pg_create_physical_replication_slot() works as before.

What do you think about this?

-- 
Vyacheslav Makarov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
> If the WAL segment for the specified restart_lsn (STOP_LSN of the backup)
> exists, then the function will create a physical replication slot and will
> keep all the WAL segments required by the replica to catch up with the
> primary. Otherwise, it returns error, which means that the required WAL
> segments have been already utilised, so we do need to take a new backup.
> Without passing this newly added parameter
> pg_create_physical_replication_slot() works as before.
>
> What do you think about this?

I think that this was discussed in the past (perhaps one of the
threads related to WAL advancing actually?), and this stuff is full of
holes when it comes to think about error handling with checkpoints
running in parallel, potentially doing recycling of segments you would
expect to be around based on your input value for restart_lsn *while*
pg_create_physical_replication_slot() is still running and
manipulating the on-disk slot information.  I suspect that this also
breaks a couple of assumptions behind concurrent calls of the minimum
LSN calculated across slots when a caller sees fit to recompute the
thresholds (WAL senders mainly here, depending on the replication
activity).
--
Michael

Attachment

Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

From
Alexey Kondratov
Date:
On 2020-06-19 03:59, Michael Paquier wrote:
> On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
>> If the WAL segment for the specified restart_lsn (STOP_LSN of the 
>> backup)
>> exists, then the function will create a physical replication slot and 
>> will
>> keep all the WAL segments required by the replica to catch up with the
>> primary. Otherwise, it returns error, which means that the required 
>> WAL
>> segments have been already utilised, so we do need to take a new 
>> backup.
>> Without passing this newly added parameter
>> pg_create_physical_replication_slot() works as before.
>> 
>> What do you think about this?
> 
> I think that this was discussed in the past (perhaps one of the
> threads related to WAL advancing actually?),
> 

I have searched through the archives a bit and found one thread related 
to slots advancing [1]. It was dedicated to a problem of advancing slots 
which do not reserve WAL yet, if I get it correctly. Although it is 
somehow related to the topic, it was a slightly different issue, IMO.

> 
> and this stuff is full of
> holes when it comes to think about error handling with checkpoints
> running in parallel, potentially doing recycling of segments you would
> expect to be around based on your input value for restart_lsn *while*
> pg_create_physical_replication_slot() is still running and
> manipulating the on-disk slot information. I suspect that this also
> breaks a couple of assumptions behind concurrent calls of the minimum
> LSN calculated across slots when a caller sees fit to recompute the
> thresholds (WAL senders mainly here, depending on the replication
> activity).
> 

These are the right concerns, but all of them should be applicable to 
the pg_create_physical_replication_slot() + immediately_reserve == true 
in the same way, doesn't it? I think so, since in that case we are doing 
a pretty similar thing — trying to reserve some WAL segment that may be 
concurrently deleted.

And this is exactly the reason why ReplicationSlotReserveWal() does it 
in several steps in a loop:

1. Creates a slot with some restart_lsn.
2. Does ReplicationSlotsComputeRequiredLSN() to prevent removal of the 
WAL segment with this restart_lsn.
3. Checks that required WAL segment is still there.
4. Repeat if this attempt to prevent WAL removal has failed.

I guess that the only difference in the case of proposed scenario is 
that we do not have a chance for step 4, since we do need some specific 
restart_lsn, not any recent restart_lsn, i.e. in this case we have to:

1. Create a slot with restart_lsn specified by user.
2. Do ReplicationSlotsComputeRequiredLSN() to prevent WAL removal.
3. Check that required WAL segment is still there and report ERROR to 
the user if it is not.

I have eyeballed the attached patch and it looks like doing exactly the 
same, so issues with concurrent deletion are not obvious for me. Or, 
there are should be the same issues for 
pg_create_physical_replication_slot() + immediately_reserve == true with 
current master implementation.

[1] 
https://www.postgresql.org/message-id/flat/20180626071305.GH31353%40paquier.xyz


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




On 2020/06/19 23:20, Alexey Kondratov wrote:
> On 2020-06-19 03:59, Michael Paquier wrote:
>> On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
>>> If the WAL segment for the specified restart_lsn (STOP_LSN of the backup)
>>> exists, then the function will create a physical replication slot and will
>>> keep all the WAL segments required by the replica to catch up with the
>>> primary. Otherwise, it returns error, which means that the required WAL
>>> segments have been already utilised, so we do need to take a new backup.
>>> Without passing this newly added parameter
>>> pg_create_physical_replication_slot() works as before.
>>>
>>> What do you think about this?

Currently pg_create_physical_replication_slot() and CREATE_REPLICATION_SLOT
replication command seem to be "idential". So if we add new option into one,
we should add it also into another?


What happen if future LSN is specified in restart_lsn? With the patch,
in this case, if the segment at that LSN exists (e.g., because it's recycled
one), the slot seems to be successfully created. However if the LSN is
far future and the segment doesn't exist, the creation of slot seems to fail.
This behavior looks fragile and confusing. We should accept future LSN
whether its segment currently exists or not?


+    if (!RecoveryInProgress() && !SlotIsLogical(MyReplicationSlot))

With the patch, the given restart_lsn seems to be ignored during recovery.
Why?

>>
>> I think that this was discussed in the past (perhaps one of the
>> threads related to WAL advancing actually?),
>>
> 
> I have searched through the archives a bit and found one thread related to slots advancing [1]. It was dedicated to a
problemof advancing slots which do not reserve WAL yet, if I get it correctly. Although it is somehow related to the
topic,it was a slightly different issue, IMO.
 
> 
>>
>> and this stuff is full of
>> holes when it comes to think about error handling with checkpoints
>> running in parallel, potentially doing recycling of segments you would
>> expect to be around based on your input value for restart_lsn *while*
>> pg_create_physical_replication_slot() is still running and
>> manipulating the on-disk slot information. I suspect that this also
>> breaks a couple of assumptions behind concurrent calls of the minimum
>> LSN calculated across slots when a caller sees fit to recompute the
>> thresholds (WAL senders mainly here, depending on the replication
>> activity).
>>
> 
> These are the right concerns, but all of them should be applicable to the pg_create_physical_replication_slot() +
immediately_reserve== true in the same way, doesn't it? I think so, since in that case we are doing a pretty similar
thing— trying to reserve some WAL segment that may be concurrently deleted.
 
> 
> And this is exactly the reason why ReplicationSlotReserveWal() does it in several steps in a loop:
> 
> 1. Creates a slot with some restart_lsn.
> 2. Does ReplicationSlotsComputeRequiredLSN() to prevent removal of the WAL segment with this restart_lsn.
> 3. Checks that required WAL segment is still there.
> 4. Repeat if this attempt to prevent WAL removal has failed.

What happens if concurrent checkpoint decides to remove the segment
at restart_lsn before #2 and then actually removes it after #3?
The replication slot is successfully created with the given restart_lsn,
but the reserved segment has already been removed?


> I guess that the only difference in the case of proposed scenario is that we do not have a chance for step 4, since
wedo need some specific restart_lsn, not any recent restart_lsn, i.e. in this case we have to:
 
> 
> 1. Create a slot with restart_lsn specified by user.
> 2. Do ReplicationSlotsComputeRequiredLSN() to prevent WAL removal.
> 3. Check that required WAL segment is still there and report ERROR to the user if it is not.

The similar situation as the above may happen.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

From
Alexey Kondratov
Date:
On 2020-06-19 21:57, Fujii Masao wrote:
> On 2020/06/19 23:20, Alexey Kondratov wrote:
>> On 2020-06-19 03:59, Michael Paquier wrote:
>>> On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
>>>> If the WAL segment for the specified restart_lsn (STOP_LSN of the 
>>>> backup)
>>>> exists, then the function will create a physical replication slot 
>>>> and will
>>>> keep all the WAL segments required by the replica to catch up with 
>>>> the
>>>> primary. Otherwise, it returns error, which means that the required 
>>>> WAL
>>>> segments have been already utilised, so we do need to take a new 
>>>> backup.
>>>> Without passing this newly added parameter
>>>> pg_create_physical_replication_slot() works as before.
>>>> 
>>>> What do you think about this?
> 
> Currently pg_create_physical_replication_slot() and 
> CREATE_REPLICATION_SLOT
> replication command seem to be "idential". So if we add new option into 
> one,
> we should add it also into another?
> 

I wonder how it could be used via the replication protocol, but probably 
this option should be added there as well for consistency.

> 
> What happen if future LSN is specified in restart_lsn? With the patch,
> in this case, if the segment at that LSN exists (e.g., because it's 
> recycled
> one), the slot seems to be successfully created. However if the LSN is
> far future and the segment doesn't exist, the creation of slot seems to 
> fail.
> This behavior looks fragile and confusing. We should accept future LSN
> whether its segment currently exists or not?
> 

But what about a possible timeline switch? If we allow specifying it as 
further in the future as one wanted, then appropriate segment with 
specified LSN may be created in the different timeline if it would be 
switched, so it may be misleading. I am not even sure about allowing 
future LSN for existing segments, since PITR / timeline switch may occur 
just after the slot creation, so the pointer may never be valid. Would 
it be better to completely disallow future LSN?

And here I noticed another moment in the patch. TimeLineID of the last 
restart/checkpoint is used to detect whether WAL segment file exists or 
not. It means that if we try to create a slot just after a timeline 
switch, then we could not specify the oldest LSN actually available on 
the disk, since it may be from the previous timeline. One can use LSN 
only within the current timeline. It seems to be fine, but should be 
covered in the docs.

> 
> +    if (!RecoveryInProgress() && !SlotIsLogical(MyReplicationSlot))
> 
> With the patch, the given restart_lsn seems to be ignored during 
> recovery.
> Why?
> 

I have the same question, not sure that this is needed here. It looks 
more like a forgotten copy-paste from ReplicationSlotReserveWal().

>>> 
>>> I think that this was discussed in the past (perhaps one of the
>>> threads related to WAL advancing actually?),
>>> 
>> 
>> I have searched through the archives a bit and found one thread 
>> related to slots advancing [1]. It was dedicated to a problem of 
>> advancing slots which do not reserve WAL yet, if I get it correctly. 
>> Although it is somehow related to the topic, it was a slightly 
>> different issue, IMO.
>> 
>>> 
>>> and this stuff is full of
>>> holes when it comes to think about error handling with checkpoints
>>> running in parallel, potentially doing recycling of segments you 
>>> would
>>> expect to be around based on your input value for restart_lsn *while*
>>> pg_create_physical_replication_slot() is still running and
>>> manipulating the on-disk slot information.
>>> ...
>> 
>> These are the right concerns, but all of them should be applicable to 
>> the pg_create_physical_replication_slot() + immediately_reserve == 
>> true in the same way, doesn't it? I think so, since in that case we 
>> are doing a pretty similar thing — trying to reserve some WAL segment 
>> that may be concurrently deleted.
>> 
>> And this is exactly the reason why ReplicationSlotReserveWal() does it 
>> in several steps in a loop:
>> 
>> 1. Creates a slot with some restart_lsn.
>> 2. Does ReplicationSlotsComputeRequiredLSN() to prevent removal of the 
>> WAL segment with this restart_lsn.
>> 3. Checks that required WAL segment is still there.
>> 4. Repeat if this attempt to prevent WAL removal has failed.
> 
> What happens if concurrent checkpoint decides to remove the segment
> at restart_lsn before #2 and then actually removes it after #3?
> The replication slot is successfully created with the given 
> restart_lsn,
> but the reserved segment has already been removed?
> 

I though about it a bit more and it seems that yes, there is a race even 
for a current pg_create_physical_replication_slot() + 
immediately_reserve == true, i.e. ReplicationSlotReserveWal(). However, 
the chance is very subtle since we take a current GetRedoRecPtr() there. 
Probably one could reproduce it with wal_keep_segments = 1 by holding / 
releasing backend doing the slot creation and checkpointer with gdb, but 
not sure that it is an issue anywhere in the real world.

Maybe I am wrong, but it is not clear for me why current 
ReplicationSlotReserveWal() routine does not have that race. I will try 
to reproduce it though.

Things get worse when we allow specifying an older LSN, since it has a 
higher chances to be at the horizon of deletion by checkpointer. Anyway, 
if I get it correctly, with a current patch slot will be created 
successfully, but will be obsolete and should be invalidated by the next 
checkpoint.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Mon, Jun 22, 2020 at 08:18:58PM +0300, Alexey Kondratov wrote:
> I wonder how it could be used via the replication protocol, but probably
> this option should be added there as well for consistency.

Mostly the same code path is taken by the SQL function and the
replication command, so adding a new option to both when adding a new
option makes sense to me for consistency.  The SQL functions are
actually easier to use when it comes to tests, as there is no need to
worry about COPY_BOTH not supported in psql.

> Things get worse when we allow specifying an older LSN, since it has a
> higher chances to be at the horizon of deletion by checkpointer. Anyway, if
> I get it correctly, with a current patch slot will be created successfully,
> but will be obsolete and should be invalidated by the next checkpoint.

Is that a behavior acceptable for the end user?  For example, a
physical slot that is created to immediately reserve WAL may get
invalidated, causing it to actually not keep WAL around contrary to
what the user has wanted the command to do.
--
Michael

Attachment

Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

From
Alexey Kondratov
Date:
On 2020-06-23 04:18, Michael Paquier wrote:
> On Mon, Jun 22, 2020 at 08:18:58PM +0300, Alexey Kondratov wrote:
>> Things get worse when we allow specifying an older LSN, since it has a
>> higher chances to be at the horizon of deletion by checkpointer. 
>> Anyway, if
>> I get it correctly, with a current patch slot will be created 
>> successfully,
>> but will be obsolete and should be invalidated by the next checkpoint.
> 
> Is that a behavior acceptable for the end user?  For example, a
> physical slot that is created to immediately reserve WAL may get
> invalidated, causing it to actually not keep WAL around contrary to
> what the user has wanted the command to do.
> 

I can imagine that it could be acceptable in the initially proposed 
scenario for someone, since creation of a slot with historical 
restart_lsn is already unpredictable — required segment may exist or may 
do not exist. However, adding here an undefined behaviour even after a 
slot creation does not look good to me anyway.

I have looked closely on the checkpointer code and another problem is 
that it decides once which WAL segments to delete based on the 
replicationSlotMinLSN, and does not check anything before the actual 
file deletion. That way the gap for a possible race is even wider. I do 
not know how to completely get rid of this race without introducing of 
some locking mechanism, which may be costly.

Thanks for feedback
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company