Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-Wz==yKxVXdpyQXdhGG00NimRsRd6JMfN+ev91Uka02r5RQ@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: decoupling table and index vacuum
List pgsql-hackers
On Tue, Feb 8, 2022 at 10:58 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Right, that's why I asked the question. If we're going to ask the
> index AM whether it would like to be vacuumed right now, we're going
> to have to put some logic into the index AM that knows how to answer
> that question. But if we don't have any useful statistics that would
> let us answer the question correctly, then we have problems.

I have very little faith in the use of statistical sampling for
anything involving vacuuming. In fact, I think that the current way in
which ANALYZE counts dead tuples is a misapplication of statistics. It
isn't even wrong. One of the things that I really like about this
project is that it can plausibly solve that problem by splitting up
the work of VACUUM, at low cost -- it's less top-down. Not only do you
get the obvious benefits with preventing bloat; you also get
*continual* feedback about the actual physical reality in the table
(and indexes, to a lesser extent). As I said recently, right now the
more bloat we have, the more uncertainty about the total amount of
bloat exists. We need to control both the bloat, and the uncertainty
about the bloat.

The basic high level idea behind how the optimizer uses statistics
involves the assumption that *all* the rows in the table are
*themselves* a sample taken from some larger distribution -- something
from the real physical world (meeting this assumption is one reason
why database/schema normalization really matters). And so on a good
week it probably won't matter too much to the optimizer if ANALYZE
doesn't run until the table size doubles (for a table that was already
quite large). These are pretty delicate assumptions, that (from the
point of view of the optimizer) work out surprisingly well in
practice.

Bloat just isn't like that. Dead tuples are fundamentally cyclic and
dynamic in nature -- conventional statistics just won't work with
something like that. Worst of all, the process that counts dead tuples
(ANALYZE) is really an active participant in the system -- the whole
entire purpose of even looking is to *reduce* the number of dead
tuples by making an autovacuum run. That's deeply weird.

> The point is that it's a continuum. If we decide that we're asking the
> index "do you want extra vacuuming?" then that phrasing suggests that
> you should only say yes if you really need it. If we decide we're
> asking the index "can we skip vacuuming you this time?" then the
> phrasing suggests that you should not feel bad about insisting on a
> vacuum right now, and only surrender your claim if you're sure you
> don't need it. But in reality, no bias either way is warranted.

Actually, I think that this particular bias *is* warranted. We should
openly and plainly be biased in the direction of causing the least
harm. What's wrong with that? Having accurate information in not an
intrinsic good. I even think that having more information can be
strictly worse, because you might actually believe it. Variance
matters a lot -- the bias/variance tradeoff is pretty fundamental
here.

I'm also saying some of this stuff because of broader VACUUM design
considerations. VACUUM fundamentally has to work at the table level,
and I don't see that changing. The approach of making autovacuum do
something akin to a plain VACUUM command in the simplest cases, and
only later some extra "dynamic mini vacuums" (that pick up where the
VACUUM command style VACUUM left off) has a lot to recommend it. This
approach allows most of the current autovacuum settings to continue to
work in roughly the same way. They just need to have their
documentation updated to make it clear that they're about the worst
case.

> To expand on that just a bit, if I'm a btree index and someone asks me
> "can we skip vacuuming you this time?" I might say "return dead_tups <
> tiny_amount" and if they ask me "do you want extra vacuuming" I might
> say "return dead_tups > quite_large_amount". But if they ask me
> "should we vacuum you now?" then I might say "return dead_tups >
> moderate_amount" which feels like the correct thing here.

The btree side of this shouldn't care at all about dead tuples (in
general we focus way too much on dead tuples, and way too little on
pages). With bottom-up index deletion the number of dead tuples in the
index is just about completely irrelevant. It's entirely possible and
often even likely that 20%+ of all index tuples will be dead at any
one time, when the optimization perfectly preserves the index
structure.

The btree side of the index AM API should be focussing on the growth
in index size, relative to some expectation (like maybe the growth for
whatever index on the same table has grown the least since last time,
accounting for obvious special cases like partial indexes). Perhaps
we'd give some consideration to bulk deletes, too. Overall, it should
be pretty simple, and should sometimes force us to do one of these
"dynamic mini vacuums" of the index just because we're not quite sure
what to do. There is nothing wrong with admitting the uncertainty.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Improve correlation names in sanity tests
Next
From: Robert Haas
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints