On 2020-06-23 04:18, Michael Paquier wrote:
> On Mon, Jun 22, 2020 at 08:18:58PM +0300, Alexey Kondratov wrote:
>> Things get worse when we allow specifying an older LSN, since it has a
>> higher chances to be at the horizon of deletion by checkpointer.
>> Anyway, if
>> I get it correctly, with a current patch slot will be created
>> successfully,
>> but will be obsolete and should be invalidated by the next checkpoint.
>
> Is that a behavior acceptable for the end user? For example, a
> physical slot that is created to immediately reserve WAL may get
> invalidated, causing it to actually not keep WAL around contrary to
> what the user has wanted the command to do.
>
I can imagine that it could be acceptable in the initially proposed
scenario for someone, since creation of a slot with historical
restart_lsn is already unpredictable — required segment may exist or may
do not exist. However, adding here an undefined behaviour even after a
slot creation does not look good to me anyway.
I have looked closely on the checkpointer code and another problem is
that it decides once which WAL segments to delete based on the
replicationSlotMinLSN, and does not check anything before the actual
file deletion. That way the gap for a possible race is even wider. I do
not know how to completely get rid of this race without introducing of
some locking mechanism, which may be costly.
Thanks for feedback
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company