Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-WzmfwcsUim032V4DQVdDJ4tdH1_HXTSfWoFhPKaR-kHJKg@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Feb 10, 2022 at 11:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm. I think you're vastly overestimating the extent to which it's
> possible to spread out and reschedule the work. I don't know which of
> us is wrong. From my point of view, if VACUUM is going to do a full
> phase 1 heap pass and a full phase 2 heap pass on either side of
> whatever index work it does, there is no way that things are going to
> get that much more dynamic than they are today.

Waiting to vacuum each index allows us to wait until the next VACUUM
operation on the table, giving us more TIDs to remove when we do go to
vacuum one of these large indexes. Making decisions dynamically seems
very promising because it gives us the most flexibility. In principle
the workload might not allow for any of that, but in practice I think
that it will.

> I don't understand what your point is in these two paragraphs. I'm
> just arguing that, if a raw dead tuple count is meaningless because a
> lot of them are going to disappear harmlessly with or without vacuum,
> it's reasonable to try to get around that problem by counting the
> subset of dead tuples where that isn't true. I agree that it's unclear
> how to do that, but that doesn't mean that it can't be done.

VACUUM is a participant in the system -- it sees how physical
relations are affected by the workload, but it also sees how physical
relations are affected by previous VACUUM operations. If it goes to
VACUUM an index on the basis of a relatively small difference (that
might just be noise), and does so systematically and consistently,
that might have unintended consequences. In particular, we might do
the wrong thing, again and again, because we're overinterpreting noise
again and again.

> I have the same concern about this as what I mentioned before: it's
> purely retrospective. Therefore in my mind it's a very reasonable
> choice for a backstop, but not a somewhat unsatisfying choice for a
> primary mechanism.

I'm not saying that it's impossible or even unreasonable to do
something based on the current or anticipated state of the index,
exactly. Just that you have to be realistic about how accurate that
model is going to be in practice. In practice it'll be quite noisy,
and that must be accounted for. For example, we could deliberately
coarsen the information, so that only relatively large differences in
apparent-bloatedness are visible to the model.

The other thing is that VACUUM itself cannot be expected to operate
with all that much precision, just because of how it works at a high
level. Any quantitative measure will only be meaningful as a way of
prioritizing work. Which is going to be far easier by making the
behavior dynamic, and continually reassessing. Once a relatively large
difference among two indexes first emerges, we can be relatively
confident about what to do. But smaller differences are likely just
noise.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Race condition in TransactionIdIsInProgress
Next
From: Joe Conway
Date:
Subject: Re: [PATCH v2] use has_privs_for_role for predefined roles