Re: autovacuum prioritization - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: autovacuum prioritization
Date
Msg-id CAH2-Wzk+r+YZWP+a-ryQ=Oyf+V6fw3OqvJqGduic=Qoyuj-7fQ@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum prioritization  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Jan 20, 2022 at 4:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Since we now have the failsafe, the scheduling algorithm can afford to
> > not give too much special attention to table age until we're maybe
> > over the 1 billion age mark -- or even 1.5 billion+. But once the
> > scheduling stuff starts to give table age special attention, it should
> > probably become the dominant consideration, by far, completely
> > drowning out any signals about bloat. It's kinda never really supposed
> > to get that high, so when we do end up there it is reasonable to fully
> > freak out. Unlike the bloat criteria, the wraparound safety criteria
> > doesn't seem to have much recognizable space between not worrying at
> > all, and freaking out.
>
> I do not agree with all of this. First, on general principle, I think
> sharp edges are bad. If a table had priority 0 for autovacuum 10
> minutes ago, it can't now have priority one million bazillion. If
> you're saying that the priority of wraparound needs to, in the limit,
> become higher than any bloat-based priority, that is reasonable.

I'm definitely saying considerations about wraparound need to swamp
everything else out at the limit. But I'm also making the point that
(at least with the ongoing relfrozenxid/freezing work) the system does
remarkably well at avoiding all aggressive anti-wraparound VACUUMs in
most individual tables, with most workloads. And so having an
aggressive anti-wraparound VACUUM at all now becomes a pretty strong
signal.

As we discussed on the other thread recently, you're still only going
to get anti-wraparound VACUUMs in a minority of tables with the
patches in place -- for tables that won't ever get an autovacuum for
any other reason. And so having an anti-wraparound probably just
signals that we have such a table, which is totally inconsequential.
But what about when there is an anti-wraparound VACUUM (or a need for
one) on a table whose age is already (say) 2x  the value of
autovacuum_freeze_max_age? That really is an incredibly strong signal
that something is very much amiss. Since the relfrozenxid/freezing
patch series actually makes each VACUUM able to advance relfrozenxid
in a way that's really robust when the system is not under great
pressure, the failure of that strategy becomes a really strong signal.

So it's not that table age signals something that we can generalize
about too much, without context. The context is important. The
relationship between table age and autovacuum_freeze_max_age with the
new strategy from my patch series becomes an important negative
signal, about something that we reasonably expected to be quite stable
not actually being stable.

(Sorry to keep going on about my work, but it really seems relevant.)

> Also, it's worth keeping in mind that waiting longer to freak out is
> not necessarily an advantage. It may well be that the only way the
> problem will ever get resolved is by human intervention - going in and
> fixing whatever dumb thing somebody did - e.g. resolving the pending
> prepared transaction.

In that case we ought to try to alert the user earlier.

> Those are fair concerns. I assumed that if we knew the number of pages
> in the index, which we do, it wouldn't be too hard to make an estimate
> like this ... but you know more about this than I do, so tell me why
> you think that won't work. It's perhaps worth noting that even a
> somewhat poor estimate could be a big improvement over what we have
> now.

I can construct a plausible, totally realistic counter-example that
breaks a heuristic like that, unless it focuses on extremes only, like
no index growth at all since the last VACUUM (which didn't leave
behind any deleted pages). I think that such a model can work well,
but only if it's designed to matter less and less as our uncertainty
grows. It seems as if the uncertainty grows very sharply, once you
begin to generalize past the extremes.

We have to be totally prepared for the model to be wrong, except
perhaps as a way of prioritizing things when there is real urgency,
and we don't have a choice about choosing. All models are wrong, some
are useful.

> The problem that I'm principally concerned about here is the case
> where somebody had a system that was basically OK and then at some
> point, bad things started to happen.

It seems necessary to distinguish between the case where things really
were okay for a time, and the case where they merely appeared to be
okay to somebody whose understanding of the system isn't impossibly
deep and sophisticated. You'd have to be an all-knowing oracle to be
able to tell the difference, because the system itself has no
sophisticated notion of how far it is into debt. There are things that
we can do to address this gap directly (that's what I have been doing
myself), but that can only go so far.

ISTM that the higher the amount of debt that the system is actually
in, the greater the uncertainty about the total amount of debt. In
other words, the advantage of paying down debt isn't limited to the
obvious stuff; there is also the advantage of gaining confidence about
how far into debt the system really is. The longer it's been since the
last real VACUUM, the more your model of debt/bloat is likely to have
diverged from reality.

And that's why I bring costs into it. Vacuuming at night because you
know that the cost will be relatively low, even if the benefits might
not be quite as high as you'd usually expect makes sense on its own
terms, and also has the advantage of making the overall picture
clearer to the system/your model.

> At some point they realize
> they're in trouble and try to get back on track. Very often,
> autovacuum is actually the enemy in that situation: it insists on
> consuming resources to vacuum the wrong stuff.

To some degree this is because the statistics that autovacuum has
access to are flat out wrong, even though we could do better. For
example, the issue that I highlighted a while back about ANALYZE's
dead tuples accounting. Or the issue that I pointed out on this thread
already, about relfrozenxid being a very bad indicator of what's
actually going on with XIDs in the table (at least without my
relfrozenxid patches in place).

Another idea centered on costs: with my freezing/relfrozenxid patch
series, strict append-only tables like pgbench_history will only ever
need to have VACUUM process each heap page once. That's good, but it
could be even better if we didn't have to rely on the autovacuum
scheduling and autovacuum_vacuum_insert_scale_factor to drive
everything. This is technically a special case, but it's a rather
important one -- it's both very common and not that hard to do a lot
better on. We ought to be aiming to only dirty each page exactly once,
by *dynamically* deciding to VACUUM much more often than the current
model supposes makes sense.

I think that this would require a two-way dialog between autovacuum.c
and vacuumlazy.c. At a high level, vacuumlazy.c would report back
"turns out that that table looks very much like an append-only table".
That feedback would cause the autovacuum.c scheduling to eagerly
launch another autovacuum worker, ignoring the usual criteria -- just
wait (say) another 60 seconds, and then launch a new autovacuum worker
on the same table if it became larger by some smallish fixed amount
(stop caring about percentage table growth). Constant mini-vacuums
against such a table make sense, since costs are almost exactly
proportional to the number of heap pages appended since the last
VACUUM.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: XLogReadRecord() error in XlogReadTwoPhaseData()
Next
From: Stephen Frost
Date:
Subject: Re: CREATEROLE and role ownership hierarchies