Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-Wz=cJYdFww3FifTrLUYRwMzAVPVFuCZ0RcfMnibR94Rqng@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
On Wed, Jan 18, 2023 at 12:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't know enough about the specifics of how this works to have an
> intelligent opinion about how likely these particular ideas are to
> work out. However, I think it's risky to look at estimates and try to
> infer whether they are reliable. It's too easy to be wrong. What we
> really want to do is anchor our estimates to some data source that we
> know we can trust absolutely. If you trust possibly-bad data less, it
> screws up your estimates more slowly, but it still screws them up.

Some of what I'm proposing arguably amounts to deliberately adding a
bias. But that's not an unreasonable thing in itself. I think of it as
related to the bias-variance tradeoff, which is a concept that comes
up a lot in machine learning and statistical inference.

We can afford to be quite imprecise at times, especially if we choose
a bias that we know has much less potential to do us harm -- some
mistakes hurt much more than others. We cannot afford to ever be
dramatically wrong, though -- especially in the direction of vacuuming
less often.

Besides, there is something that we *can* place a relatively high
degree of trust in that will still be in the loop here: VACUUM itself.
If VACUUM runs then it'll call pgstat_report_vacuum(), which will set
the record straight in the event of over estimating dead tuples. To
some degree the problem of over estimating dead tuples is
self-limiting.

> If Andres is correct that what really matter is the number of pages
> we're going to have to dirty, we could abandon counting dead tuples
> altogether and just count not-all-visible pages in the VM map.

That's what matters most from a cost point of view IMV. So it's a big
part of the overall picture, but not everything. It tells us
relatively little about the benefits, except perhaps when most pages
are all-visible.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Non-superuser subscription owners
Next
From: Andrew Dunstan
Date:
Subject: Re: Extracting cross-version-upgrade knowledge from buildfarm client