Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id CAH2-Wzk3KKS-82n5N5tR6JZRWS-XHmUE=BgiVPQA4S0BM2tNXg@mail.gmail.com
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
List pgsql-hackers
On Wed, Jan 18, 2023 at 11:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > pgstat_report_analyze() will totally override the
> > tabentry->dead_tuples information that drives autovacuum.c, based on
> > an estimate derived from a random sample -- which seems to me to be an
> > approach that just doesn't have any sound theoretical basis.
>
> Yikes. I think we don't have a choice but to have a method to correct
> the information somehow, because AFAIK the statistics system is not
> crash-safe. But that approach does seem to carry significant risk of
> overwriting correct information with wrong information.

This situation is really quite awful, so maybe we should do something
about it soon, in the scope of the Postgres 16 work on autovacuum that
is already underway. In fact I think that the problem here is so bad
that even something slightly less naive would be far more effective.

You're right to point out that pgstat_report_analyze needs to update
the stats in case there is a hard crash, of course. But there is
plenty of context with which to make better decisions close at hand.
For example, I bet that pgstat_report_analyze already does a pretty
good job of estimating live_tuples -- my spiel about statistics mostly
doesn't apply to live_tuples. Suppose that we notice that its new
estimate for live_tuples approximately matches what the stats
subsystem already thought about live_tuples, while dead_tuples is far
far lower. We shouldn't be so credulous as to believe the new
dead_tuples estimate at that point.

Perhaps we can change nothing about dead_tuples at all when this
happens. Or perhaps we can set dead_tuples to a value that is scaled
from the old estimate. The new dead_tuples value could be derived by
taking the ratio of the old live_tuples to the old dead_tuples, and
then using that to scale from the new live_tuples. This is just a
first pass, to see what you and others think. Even very simple
heuristics seem like they could make things much better.

Another angle of attack is the PD_ALL_VISIBLE page-level bit, which
acquire_sample_rows() could pay attention to -- especially in larger
tables, where the difference between all pages and just the
all-visible subset of pages is most likely to matter. The more sampled
pages that had PD_ALL_VISIBLE set, the less credible the new
dead_tuples estimate will be (relative to existing information), and
so pgstat_report_analyze() should prefer the new estimate over the old
one in proportion to that.

We probably shouldn't change anything about pgstat_report_vacuum as
part of this effort to make pgstat_report_analyze less terrible in the
near term. It certainly has its problems (what is true for pages that
VACUUM scanned at the end of VACUUM is far from representative for new
pages!), it's probably much less of a contributor to issues like those
that Andres reports seeing.

BTW, one of the nice things about the insert-driven autovacuum stats
is that pgstat_report_analyze doesn't have an opinion about how many
tuples were inserted since the last VACUUM ran. It does have other
problems, but they seem less serious to me.

> Yep. I think what we should try to evaluate is which number is
> furthest from the truth. My guess is that the threshold is so high
> relative to what a reasonable value would be that you can't get any
> benefit out of making the dead tuple count more accurate. Like, if the
> threshold is 100x too high, or something, then who cares how accurate
> the dead tuples number is?

Right. Or if we don't make any reasonable distinction between LP_DEAD
items and dead heap-only tuples, then the total number of both things
together may matter very little. Better to be approximately correct
than exactly wrong. Deliberately introducing a bias to lower the
variance is a perfectly valid approach.

> Maybe that's even true in some scenarios, but I bet that
> it's never the issue when people have really big tables. The fact that
> I'm OK with 10MB of bloat in my 100MB table doesn't mean I'm OK with
> 1TB of bloat in my 10TB table. Among other problems, I can't even
> vacuum away that much bloat in one index pass, because autovacuum
> can't use enough work memory for that. Also, the absolute space
> wastage matters.

I certainly agree with all that.

FWIW, part of my mental model with VACUUM is that the rules kind of
change in the case of a big table. We're far more vulnerable to issues
such as (say) waiting for cleanup locks because the overall cadence
used by autovacuum is so infrequently relative to everything else.
There are more opportunities for things to go wrong, worse
consequences when they do go wrong, and greater potential for the
problems to compound.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: minor bug
Next
From: Andres Freund
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation