On Tue, Dec 17, 2024 at 9:11 AM Tomas Vondra <tomas@vondra.me> wrote:
> I don't follow. How could non-aggressive VACUUM advance relfrozenxid,
> ever? I mean, if it doesn't guarantee freezing all pages, how could it?
Although it's very workload dependent, it still happens all the time.
Just look at the autovacuum log output from almost any autovacuum that
runs when the regression tests run. Or look at the autovacuum output
for the small pgbench tables.
In general, relfrozenxid simply tracks the oldest possible extant XID
in the table. VACUUM doesn't necessarily need to do any freezing to
advance relfrozenxid/relminmxid. But VACUUM *must* exhaustively scan
every heap page that could possibly contain an old XID in order to be
able to advance relfrozenxid/relminmxid safely.
> > That's an interesting idea. And it seems like a much more effective
> > way of getting some relfrozenxid advancement than hoping that the
> > pages you scan due to SKIP_PAGES_THRESHOLD end up being enough to have
> > scanned all unfrozen tuples.
> But I think that (a) is going to be fairly complex, because how do you
> cost the future vacuum?, and (b) is somewhat misses my point that on
> modern NVMe SSD storage (SKIP_PAGES_THRESHOLD > 1) doesn't seem to be a
> win *ever*.
I am not suggesting that the readahead argument for
SKIP_PAGES_THRESHOLD is really valid. I think that the relfrozenxid
argument is the only one that makes any sense. Clearly both arguments
justified the introduction of SKIP_PAGES_THRESHOLD, after the earliest
work on the visibility map back in 2009 -- see the commit message for
bf136cf6.
In short, I am envisaging a design that decides whether or not it'll
advance relfrozenxid based on both the costs and the benefits/need.
Under this scheme, VACUUM would either scan exactly all
all-visible-not-all-frozen pages, or scan none at all. This decision
would be almost completely independent of the decision to freeze or
not freeze pages (it'd be loosely related because FreezeLimit can
never be more than autovacuum_freeze_max_age/2 XIDs in age). Then we'd
be free to just get rid of SKIP_PAGES_THRESHOLD, which presumably
isn't doing much for readahead.
--
Peter Geoghegan