Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CA+TgmoY7hnNnCZUcaeznbEu1FwS87v6zwjNzjjVjbskdz9ff9Q@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Jan 17, 2023 at 3:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> If you assume that there is chronic undercounting of dead tuples
> (which I think is very common), ...

Why do you think that?

> How many dead heap-only tuples are equivalent to one LP_DEAD item?
> What about page-level concentrations, and the implication for
> line-pointer bloat? I don't have a good answer to any of these
> questions myself.

Seems a bit pessimistic. If we had unlimited resources and all
operations were infinitely fast, the optimal strategy would be to
vacuum after every insert, update, or delete. But in reality, that
would be prohibitively expensive, so we're making a trade-off.
Batching together cleanup for many table modifications reduces the
amortized cost of cleaning up after one such operation very
considerably. That's critical. But if we batch too much together, then
the actual cleanup doesn't happen soon enough to keep us out of
trouble.

If we had an oracle that could provide us with perfect information,
we'd ask it, among other things, how much work will be required to
vacuum right now, and how much benefit would we get out of doing so.
The dead tuple count is related to the first question. It's not a
direct, linear relationship, but it's not completely unrelated,
either. Maybe we could refine the estimates by gathering more or
different statistics than we do now, but ultimately it's always going
to be a trade-off between getting the work done sooner (and thus maybe
preventing table growth or a wraparound shutdown) and being able to do
more work at once (and thus being more efficient). The current system
set of counters predates HOT and the visibility map, so it's not
surprising if needs updating, but if you're argue that the whole
concept is just garbage, I think that's an overreaction.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Update comments in multixact.c
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Doc: add XML ID attributes to and tags.