Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date
Msg-id CAPpHfdvjGWo--xqqjJbyb_amdkhqamnzrwCZWe_hBD-rSTFbBg@mail.gmail.com
Whole thread Raw
In response to RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
List pgsql-hackers
Dear Kuroda-san,

On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
> > > Regarding assertion failure, I've found that assert in
> > > PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> > > previously set by ReplicationSlotReserveWal().  As I can see,
> > > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> > > So, it doesn't seems there is a guarantee that restart_lsn never goes
> > > backward.  The commit in ReplicationSlotReserveWal() even states there
> > > is a "chance that we have to retry".
> > >
> >
> > I don't see how this theory can lead to a restart_lsn of a slot going
> > backwards. The retry mentioned there is just a retry to reserve the
> > slot's position again if the required WAL is already removed. Such a
> > retry can only get the position later than the previous restart_lsn.
>
> We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
> and confirmed that restart_lsn can go backward. This meant that Assert() added
> by the ca307d5 can cause crash.
>
> Background
> ===========
> When pg_receivewal starts the replication and it uses the replication slot, it
> sets as the beginning of the segment where restart_lsn exists, as the startpoint.
> E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
> from 0/B00000.
> More detail of this behavior, see f61e1dd2 and d9bae531.
>
> What happened here
> ==================
> Based on above theory, walsender sent from the beginning of segment (0/B00000).
> When walreceiver receives, it tried to send reply. At that time the flushed
> location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
> in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).
>
> The assertion failure could happen if CHECKPOINT happened at that time.
> Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
> was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().

Thank you for your detailed explanation!

> Note
> ====
> 1.
> In this case, starting from the beginning of the segment is not a problem, because
> the checkpoint process only removes WAL files from segments that precede the
> restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
> so there is no risk of data loss or inconsistency.
>
> 2.
> A similar pattern applies to pg_basebackup. Both use logic that adjusts the
> requested streaming position to the start of the segment, and it replies the
> received LSN as flushed.
>
> 3.
> I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
> because it is a timing issue. Have someone else reproduced?
>
> We are still investigating failure caused at 040_standby_failover_slots_sync.

I didn't manage to reproduce this.  But as I see from the logs [1] on
mamba that START_REPLICATION command was issued just before assert
trap.  Could it be something similar to what I described in [2].
Namely:
1. ReplicationSlotReserveWal() sets restart_lsn for the slot.
2. Concurrent checkpoint flushes that restart_lsn to the disk.
3. PhysicalConfirmReceivedLocation() sets restart_lsn of the slot to
the beginning of the segment.

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba&dt=2025-06-17%2005%3A10%3A36&stg=recovery-check
[2] https://www.postgresql.org/message-id/CAPpHfdv3UEUBjsLhB_CwJT0xX9LmN6U%2B__myYopq4KcgvCSbTg%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Next
From: Robins Tharakan
Date:
Subject: leafhopper / snakefly failing to build HEAD - GCC bug