Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: should vacuum's first heap pass be read-only?
Date
Msg-id CAH2-WzkYFoHKzu4itW75ZDYnhMOFbdGbW0t4UsKOS6pV9BsoVA@mail.gmail.com
Whole thread Raw
In response to Re: should vacuum's first heap pass be read-only?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: should vacuum's first heap pass be read-only?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Apr 1, 2022 at 11:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I guess you're right, and it's actually a little bit better than that,
> because even if the data does fit into shared memory, we'll have to
> pass fewer TIDs to the worker to be removed from the heap, which might
> save a few CPU cycles. But I think both of those are very small
> benefits.

I'm not following. It seems like you're saying that the ability to
vacuum indexes on their own schedule (based on their own needs) is not
sufficiently compelling. I think it's very compelling, with enough
indexes (and maybe not very many).

The conveyor belt doesn't just save I/O from repeated scanning of the
heap. It may also save on repeated pruning (or just dirtying) of the
same heap pages again and again, for very little benefit.

Imagine an append-only table where 1% of transactions that insert are
aborts. You really want to be able to constantly VACUUM such a table,
so that its pages are proactively frozen and set all-visible in the
visibility map -- it's not that different to a perfectly append-only
table, without any garbage tuples. And so it would be very useful if
we could delay index vacuuming for much longer than the current 2% of
rel_pages heuristics seems to allow.

That heuristic has to conservatively assume that it might be some time
before the next vacuum is launched, and has the opportunity to
reconsider index vacuuming. What if it was a more or less independent
question instead? To put it another way, it would be great if the
scheduling code for autovacuum could make inferences about what
general strategy works best for a given table over time. In order to
be able to do that sensibly, the algorithm needs more context, so that
it can course correct without paying much of a cost for being wrong.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: should vacuum's first heap pass be read-only?
Next
From: Justin Pryzby
Date:
Subject: Re: PostgreSQL shutdown modes