Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: decoupling table and index vacuum
Date
Msg-id CA+TgmoYNbEsoW36WJONxG4xRLQmDdPQaJMG9WBDWgGSyHkg32w@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Feb 8, 2022 at 12:50 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > It's not clear to me that we have enough information to make good
> > decisions about which indexes to vacuum and which indexes to skip.
>
> What if "extra vacuuming, not skipping vacuuming" was not just an
> abstract goal, but an actual first-class part of the implementation,
> and the index AM API? Then the question we're asking the index/index
> AM is no longer "Do you [an index] *not* require index vacuuming, even
> though you are entitled to it according to the conventional rules of
> autovacuum scheduling?". The question is instead more like "Could you
> use an extra, early VACUUM?".
>
> if we invert the question like this then we have something that makes
> more sense at the index AM level, but requires significant
> improvements at the level of autovacuum scheduling. On the other hand
> I think that you already need to do at least some work in that area.

Right, that's why I asked the question. If we're going to ask the
index AM whether it would like to be vacuumed right now, we're going
to have to put some logic into the index AM that knows how to answer
that question. But if we don't have any useful statistics that would
let us answer the question correctly, then we have problems.

While I basically agree with everything that you just wrote, I'm
somewhat inclined to think that the question is not best phrased as
either extra-vacuum or skip-a-vacuum. Either of those supposes a
normative amount of vacuuming from which we could deviate in one
direction or the other. I think it would be better to phrase it in a
way that doesn't make such a supposition. Maybe something like: "Hi,
we are vacuuming the heap right now and we are also going to vacuum
any indexes that would like it, and does that include you?"

The point is that it's a continuum. If we decide that we're asking the
index "do you want extra vacuuming?" then that phrasing suggests that
you should only say yes if you really need it. If we decide we're
asking the index "can we skip vacuuming you this time?" then the
phrasing suggests that you should not feel bad about insisting on a
vacuum right now, and only surrender your claim if you're sure you
don't need it. But in reality, no bias either way is warranted. It is
either better that this index should be vacuumed right now, or better
that it should not be vacuumed right now, and whichever is better
should be what we choose to do.

To expand on that just a bit, if I'm a btree index and someone asks me
"can we skip vacuuming you this time?" I might say "return dead_tups <
tiny_amount" and if they ask me "do you want extra vacuuming" I might
say "return dead_tups > quite_large_amount". But if they ask me
"should we vacuum you now?" then I might say "return dead_tups >
moderate_amount" which feels like the correct thing here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: is the base backup protocol used by out-of-core tools?
Next
From: Robert Haas
Date:
Subject: Re: is the base backup protocol used by out-of-core tools?