On Tue, Mar 26, 2024 at 2:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Mar-26, Amit Kapila wrote:
>
> > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > On 2024-Mar-26, Amit Kapila wrote:
> > > > I would also like to solicit your opinion on the other slot-level
> > > > parameter we are planning to introduce. This new slot-level parameter
> > > > will be named as inactive_timeout.
> > >
> > > Maybe inactivity_timeout?
> > >
> > > > This will indicate that once the slot is inactive for the
> > > > inactive_timeout period, we will invalidate the slot. We are also
> > > > discussing to have this parameter (inactive_timeout) as GUC [1]. We
> > > > can have this new parameter both at the slot level and as well as a
> > > > GUC, or just one of those.
> > >
> > > replication_slot_inactivity_timeout?
> >
> > So, it seems you are okay to have this parameter both at slot level
> > and as a GUC.
>
> Well, I think a GUC is good to have regardless of the slot parameter,
> because the GUC can be used as an instance-wide protection against going
> out of disk space because of broken replication. However, now that I
> think about it, I'm not really sure about invalidating a slot based on
> time rather on disk space, for which we already have a parameter; what's
> your rationale for that? The passage of time is not a very good
> measure, really, because the amount of WAL being protected has wildly
> varying production rate across time.
>
The inactive slot not only blocks WAL from being removed but prevents
the vacuum from proceeding. Also, there is a risk of transaction Id
wraparound. See email [1] for more context.
> I can only see a timeout being useful as a parameter if its default
> value is not the special disable value; say, the default timeout is 3
> days (to be more precise -- the period from Friday to Monday, that is,
> between DBA leaving the office one week until discovering a problem when
> he returns early next week). This way we have a built-in mechanism that
> invalidates slots regardless of how big the WAL partition is.
>
We can have a default value for this parameter but it has the
potential to break the replication, so not sure what could be a good
default value.
>
> I'm less sure about the slot parameter; in what situation do you need to
> extend the life of one individual slot further than the life of all the
> other slots?
I was thinking of an idle slot scenario where a slot from one
particular subscriber (or output plugin) is inactive due to some
maintenance activity. But it should be okay to have a GUC for this for
now.
[1] - https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13
--
With Regards,
Amit Kapila.