Re: another autovacuum scheduling thread - Mailing list pgsql-hackers

From Robert Haas
Subject Re: another autovacuum scheduling thread
Date
Msg-id CA+TgmoZVdQvtiQJPcKyK7Kfc-J=_jph2HqiDufb1jxD836LfYA@mail.gmail.com
Whole thread Raw
In response to Re: another autovacuum scheduling thread  (Jeremy Schneider <schneider@ardentperf.com>)
List pgsql-hackers
On Fri, Oct 10, 2025 at 6:00 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
> The spectacular failures I've seen with autovac usually come down to
> things like too much sleeping (cost_delay) or too few workers, where
> better ordering would be nice but probably wouldn't fix any real
> problems leading to the spectacular failures

Since I have said the same thing myself, I can hardly disagree.
However, there are probably a few exceptions. For instance, if
autovacuum on a certain table is failing repeatedly or accomplishing
nothing without removing the apparent need to autovacuum, and happens
to be the first one in pg_class, it could divert a lot of attention
from other tables.

> Robert it sounds to me like the main use case you're focused on here
> is where basically wraparound is imminent - we are already screwed - and
> our very last hope was that a last-ditch autovac can finish just in time

Yes, I would argue that this is the scenario that really matters. As
you say above, the main thing is having little enough sleeping and a
sufficient number of workers. When that's the case, we can do the work
in any order and life will mostly be fine. However, if we get into a
desperate situation by, say, having one table that can't be vacuumed,
and eventually someone fixes that, say by dropping the corrupt index
that is preventing vacuuming of that table, we might like it if
autovacuum focused on getting that table vacuumed rather than getting
lost in the sauce. Of course, if we have the pretty common situation
where autovacuum gets behind on all tables, say due to a stale
replication slot, then this is less critical, although a perfect
system would probably prioritize vacuuming the *largest* tables in
this situation, since those will take the longest to finish, and it's
when a vacuum of every table in the cluster has been *completed* that
the XID horizons can advance.

> I hope y'all just pick something and commit it without getting too lost
> in the details. I honestly think in the list of improvements around
> autovac, this is the lowest priority on my list of hopes and dreams as a
> user for wraparound prevention :) because if this ever matters to me for
> avoiding wraparound, I was screwed long before we got to this point and
> this is not going to fix my underlying problems.

I'm not sure if this was your intention, but to me this kind of reads
like "well, it's not going to matter anyway so just do whatever and
move on" and I don't agree with that. I think that if we're not going
to do high-quality engineering here, we just shouldn't change anything
at all. It's better to keep having the same bad behavior than for each
release to have new and different bad behavior. One possible positive
result of leaning into this prioritization problem is that whoever's
working in it (Nathan, in this case) might gain some useful insights
about how to tackle some of the other problems in this space. All of
this is hard enough that we haven't really had any major improvements
in this area since, I want to say, 8.3, and it's desirable to break
that logjam even if we don't all agree on which problems are most
urgent. Even if I ultimately don't agree with whatever Nathan wants to
do or proposes, I'm glad he's trying to do something, which is (in my
experience) generally much better than making no effort at all.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Build failure with Meson >= 1.8.3 on Windows
Next
From: Ashutosh Bapat
Date:
Subject: Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`