Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | ZfvXiPlz/lackQlp@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi, On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote: > On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote: > > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote: > > > > > > > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk > > > > > replication slot data structure. > > > > > > > > Should last_inactive_at be tracked on disk? Say the engine is down for a period > > > > of time > inactive_timeout then the slot will be invalidated after the engine > > > > re-start (if no activity before we invalidate the slot). Should the time the > > > > engine is down be counted as "inactive" time? I've the feeling it should not, and > > > > that we should only take into account inactive time while the engine is up. > > > > > > > > > > Good point. The question is how do we achieve this without persisting > > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot > > > had some valid value before we shut down but it still didn't cross the > > > configured 'inactive_timeout' value, so, we won't be able to > > > invalidate it. Now, after the restart, as we don't know the > > > last_inactive_at's value before the shutdown, we will initialize it > > > with 0 (this is what Bharath seems to have done in the latest > > > v13-0002* patch). After this, even if walsender or backend never > > > acquires the slot, we won't invalidate it. OTOH, if we track > > > 'last_inactive_at' on the disk, after, restart, we could initialize it > > > to the current time if the value is non-zero. Do you have any better > > > ideas? > > > > > > > I think that setting last_inactive_at when we restart makes sense if the slot > > has been active previously. I think the idea is because it's holding xmin/catalog_xmin > > and that we don't want to prevent rows removal longer that the timeout. > > > > So what about relying on xmin/catalog_xmin instead that way? > > > > That doesn't sound like a great idea because xmin/catalog_xmin values > won't tell us before restart whether it was active or not. It could > have been inactive for long time before restart but the xmin values > could still be valid. Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer than timeout. My concern was that we set catalog_xmin at logical slot creation time. So if we set last_inactive_at to zero at creation time and the slot is not used for a long period of time > timeout, then I think it's not helping there. > What about we always set 'last_inactive_at' at > restart (if the slot's inactive_timeout has non-zero value) and reset > it as soon as someone acquires that slot? Now, if the slot doesn't get > acquired till 'inactive_timeout', checkpointer will invalidate the > slot. Yeah that sounds good to me, but I think we should set last_inactive_at at creation time too, if not: - physical slot could remain valid for long time after creation (which is fine) but the behavior would change at restart. - logical slot would have the "issue" reported above (holding catalog_xmin). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: