Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date
Msg-id CAA4eK1+qL9QFOD2Q5kcq0Ff=7OcBLE34QuDBdPcdznUzQwv+eg@mail.gmail.com
Whole thread Raw
In response to Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly  ("Vitaly Davydov" <v.davydov@postgrespro.ru>)
Responses RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
List pgsql-hackers
On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote:
> > > I think, it is a good idea. Once we do not use the generated data, it is ok
> > > just to generate WAL segments using the proposed function. I've tested this
> > > function. The tests worked as expected with and without the fix. The attached
> > > patch does the change.
> >
> > Sorry, forgot to attach the patch. It is created on the current master branch.
> > It may conflict with your corrections. I hope, it could be useful.
>
> Thank you.  I've integrated this into a patch to improve these tests.
>
> Regarding assertion failure, I've found that assert in
> PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> previously set by ReplicationSlotReserveWal().  As I can see,
> ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> So, it doesn't seems there is a guarantee that restart_lsn never goes
> backward.  The commit in ReplicationSlotReserveWal() even states there
> is a "chance that we have to retry".
>

I don't see how this theory can lead to a restart_lsn of a slot going
backwards. The retry mentioned there is just a retry to reserve the
slot's position again if the required WAL is already removed. Such a
retry can only get the position later than the previous restart_lsn.

>  Thus, I propose to remove the
> assertion introduced by ca307d5cec90.
>

If what I said above is correct, then the following part of the commit
message will be incorrect:
"As stated in the ReplicationSlotReserveWal() comment, this is not
always true. Additionally, this issue has been spotted by some
buildfarm
members."

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Aleksander Alekseev
Date:
Subject: Re: [PATCH] Split varlena.c into varlena.c and bytea.c