Home > mailing lists

Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: decoupling table and index vacuum
Date	April 23, 2021 20:04:41
Msg-id	CAH2-WznkCtfv-TeNCm5P-T1LXv3wspvFmzYAXfhJ6uSJqsSGWQ@mail.gmail.com Whole thread Raw
In response to	Re: decoupling table and index vacuum (Andres Freund <andres@anarazel.de>)
Responses	Re: decoupling table and index vacuum
List	pgsql-hackers

Tree view

On Thu, Apr 22, 2021 at 1:01 PM Andres Freund <andres@anarazel.de> wrote:
> The gin case seems a bit easier than the partial index case. Keeping
> stats about the number of new entries in a GIN index doesn't seem too
> hard, nor does tracking the number of cleaned up index entries. But
> knowing which indexes are affected when a heap tuple becomes dead seems
> harder.  I guess we could just start doing a stats-only version of
> ExecInsertIndexTuples() for deletes, but obviously the cost of that is
> not enticing. Perhaps it'd not be too bad if we only did it when there's
> an index with predicates?

Though I agree that we need some handling here, I doubt that an index
with a predicate is truly a special case.

Suppose you have a partial index that covers 10% of the table. How is
that meaningfully different from an index without a predicate that is
otherwise equivalent? If the churn occurs in the same keyspace in
either case, and if that's the part of the keyspace that queries care
about, then ISTM that the practical difference is fairly
insignificant. (If you have some churn all over the standard index by
queries are only interested in the same 10% of the full keyspace then
this will be less true, but still roughly true.)

There is an understandable tendency to focus on the total size of the
index in each case, and to be alarmed that the partial index has (say)
doubled in size, while at the same time not being overly concerned
about lower *proportionate* growth for the standard index case
(assuming otherwise identical workload/conditions). The page splits
that affect the same 10% of the key space in each case will be
approximately as harmful in each case, though. We can expect the same
growth in leaf pages in each case, which will look very similar.

It should be obvious that it's somewhat of a problem that 90% of the
standard index is apparently not useful (this is an unrelated
problem). But if the DBA fixes this unrelated problem (by making the
standard index a partial index), surely it would be absurd to then
conclude that that helpful intervention somehow had the effect of
making the index bloat situation much worse!

I think that a simple heuristic could work very well here, but it
needs to be at least a little sensitive to the extremes. And I mean
all of the extremes, not just the one from my example -- every
variation exists and will cause problems if given zero weight.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Peter Geoghegan
Date: 23 April 2021, 18:55:53
Subject: Re: decoupling table and index vacuum

From: Robert Haas
Date: 23 April 2021, 20:31:36
Subject: Re: pg_amcheck contrib application

Re: decoupling table and index vacuum - Mailing list pgsql-hackers

Previous

Next