Home > mailing lists

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date	November 20 23:40:50
Msg-id	104b2fa9-407d-439f-b760-156ccd97b571@vondra.me Whole thread Raw
In response to	Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly ("Vitaly Davydov" <v.davydov@postgrespro.ru>)
List	pgsql-hackers

Tree view


On 11/20/24 14:40, Vitaly Davydov wrote:
> Dear Hackers,
> 
>  
> 
> To ping the topic, I'd like to clarify what may be wrong with the idea
> described here, because I do not see any interest from the community.
> The topic is related to physical replication. The primary idea is to
> define the horizon of WAL segments (files) removal based on saved on
> disk restart LSN values. Now, the WAL segment removal horizon is
> calculated based on the current restart LSN values of slots, that can
> not be saved on disk at the time of the horizon calculation. The case
> take place when a slot is advancing during checkpoint as described
> earlier in the topic.
> 

Yeah, a simple way to fix this might be to make sure we don't use the
required LSN value set after CheckPointReplicationSlots() to remove WAL.
AFAICS the problem is KeepLogSeg() gets the new LSN value by:

   keep = XLogGetReplicationSlotMinimumLSN();

Let's say we get the LSN before calling CheckPointGuts(), and then pass
it to KeepLogSeg, so that it doesn't need to get the fresh value.

Wouldn't that fix the issue?

> 
> Such behaviour is not a problem when slots are used only for physical
> replication in a conventional way. But it may be a problem when physical
> slot is used for some other goals. For example, I have an extension
> which keeps the WAL using physical replication slots. It creates a new
> physical slot and advances it as needed. After restart, it can use
> restart lsn of the slot to read WAL from this LSN. In this case, there
> is no guarantee that restart lsn will point to an existing WAL segment.
> 

Yeah.

> 
> The advantage of the current behaviour is that it requires a little bit
> less WAL to keep. The disadvantage is that physical slots do not
> guarantee WAL keeping starting from its' restart lsns in general.
> 

If it's wrong, it doesn't really matter it has some advantages.


regards

-- 
Tomas Vondra

pgsql-hackers by date:

From: "Devulapalli, Raghuveer"
Date: 20 November, 23:37:33
Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From: Tom Lane
Date: 20 November, 23:43:16
Subject: Re: EphemeralNamedRelation and materialized view

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers

Previous

Next