On Wed, Mar 27, 2024 at 3:13 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Yeah, I think that both makes senses. The reason is that one depends of the
> database activity and slot activity (the xid age one) while the other (the
> timeout one) depends only of the slot activity.
FWIW, I thought the time-based one sounded more useful. I think it
would be poor planning to say "well, if the slot reaches an XID age of
a billion, kill it so we don't wrap around," because while that likely
will prevent me from getting into wraparound trouble, my database is
likely to become horribly bloated long before the cutoff is reached. I
thought it would be easier to reason in terms of time: I don't expect
a slave to ever be down for more than X period of time, say an hour or
whatever, so if it is, forget about it. Or alternatively, I know that
if a slave does go down for more than X period of time, I start to get
bloat, so cut it off at that point and I'll rebuild it later. I feel
like these are things where people's intuition is going to be much
stronger when reckoning in units of wall-clock time, which everyone
deals with every day in one way or another, rather than in XID-based
units that are, at least in my view, just a lot less intuitive.
For a previous example of where an XID threshold turned out not to be
great, see vacuum_defer_cleanup_age, and in particular the commit
message from where it was removed in
1118cd37eb61e6a2428f457a8b2026a7bb3f801a. The case here might not turn
out to be quite comparable for one reason or another, but I do think
that case is a cautionary tale.
I'm sure the world won't end or anything if we end up with both
thresholds, and I may be missing some reason why the XID threshold
would be really great here. I just can't quite see why I'd ever
recommend it to anyone.
--
Robert Haas
EDB: http://www.enterprisedb.com