Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-WznkCtfv-TeNCm5P-T1LXv3wspvFmzYAXfhJ6uSJqsSGWQ@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Apr 22, 2021 at 1:01 PM Andres Freund <andres@anarazel.de> wrote:
> The gin case seems a bit easier than the partial index case. Keeping
> stats about the number of new entries in a GIN index doesn't seem too
> hard, nor does tracking the number of cleaned up index entries. But
> knowing which indexes are affected when a heap tuple becomes dead seems
> harder.  I guess we could just start doing a stats-only version of
> ExecInsertIndexTuples() for deletes, but obviously the cost of that is
> not enticing. Perhaps it'd not be too bad if we only did it when there's
> an index with predicates?

Though I agree that we need some handling here, I doubt that an index
with a predicate is truly a special case.

Suppose you have a partial index that covers 10% of the table. How is
that meaningfully different from an index without a predicate that is
otherwise equivalent? If the churn occurs in the same keyspace in
either case, and if that's the part of the keyspace that queries care
about, then ISTM that the practical difference is fairly
insignificant. (If you have some churn all over the standard index by
queries are only interested in the same 10% of the full keyspace then
this will be less true, but still roughly true.)

There is an understandable tendency to focus on the total size of the
index in each case, and to be alarmed that the partial index has (say)
doubled in size, while at the same time not being overly concerned
about lower *proportionate* growth for the standard index case
(assuming otherwise identical workload/conditions). The page splits
that affect the same 10% of the key space in each case will be
approximately as harmful in each case, though. We can expect the same
growth in leaf pages in each case, which will look very similar.

It should be obvious that it's somewhat of a problem that 90% of the
standard index is apparently not useful (this is an unrelated
problem). But if the DBA fixes this unrelated problem (by making the
standard index a partial index), surely it would be absurd to then
conclude that that helpful intervention somehow had the effect of
making the index bloat situation much worse!

I think that a simple heuristic could work very well here, but it
needs to be at least a little sensitive to the extremes. And I mean
all of the extremes, not just the one from my example -- every
variation exists and will cause problems if given zero weight.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: decoupling table and index vacuum
Next
From: Robert Haas
Date:
Subject: Re: pg_amcheck contrib application