Re: pgsql: Track last_inactive_time in pg_replication_slots. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: pgsql: Track last_inactive_time in pg_replication_slots.
Date
Msg-id CAA4eK1+SLfHrS_haOvD5oXk6cwD9Szh8BYR5MpPRJaBDMaELXA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Track last_inactive_time in pg_replication_slots.  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: pgsql: Track last_inactive_time in pg_replication_slots.
List pgsql-hackers
On Tue, Mar 26, 2024 at 2:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Mar-26, Amit Kapila wrote:
>
> > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > On 2024-Mar-26, Amit Kapila wrote:
> > > > I would also like to solicit your opinion on the other slot-level
> > > > parameter we are planning to introduce.  This new slot-level parameter
> > > > will be named as inactive_timeout.
> > >
> > > Maybe inactivity_timeout?
> > >
> > > > This will indicate that once the slot is inactive for the
> > > > inactive_timeout period, we will invalidate the slot. We are also
> > > > discussing to have this parameter (inactive_timeout) as GUC [1]. We
> > > > can have this new parameter both at the slot level and as well as a
> > > > GUC, or just one of those.
> > >
> > > replication_slot_inactivity_timeout?
> >
> > So, it seems you are okay to have this parameter both at slot level
> > and as a GUC.
>
> Well, I think a GUC is good to have regardless of the slot parameter,
> because the GUC can be used as an instance-wide protection against going
> out of disk space because of broken replication.  However, now that I
> think about it, I'm not really sure about invalidating a slot based on
> time rather on disk space, for which we already have a parameter; what's
> your rationale for that?  The passage of time is not a very good
> measure, really, because the amount of WAL being protected has wildly
> varying production rate across time.
>

The inactive slot not only blocks WAL from being removed but prevents
the vacuum from proceeding. Also, there is a risk of transaction Id
wraparound. See email [1] for more context.

> I can only see a timeout being useful as a parameter if its default
> value is not the special disable value; say, the default timeout is 3
> days (to be more precise -- the period from Friday to Monday, that is,
> between DBA leaving the office one week until discovering a problem when
> he returns early next week).  This way we have a built-in mechanism that
> invalidates slots regardless of how big the WAL partition is.
>

We can have a default value for this parameter but it has the
potential to break the replication, so not sure what could be a good
default value.

>
> I'm less sure about the slot parameter; in what situation do you need to
> extend the life of one individual slot further than the life of all the
> other slots?

I was thinking of an idle slot scenario where a slot from one
particular subscriber (or output plugin) is inactive due to some
maintenance activity. But it should be okay to have a GUC for this for
now.

[1] - https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Peter Eisentraut
Date:
Subject: Re: Improve readability by using designated initializers when possible