Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: decoupling table and index vacuum
Date
Msg-id CA+TgmobrGs0c2W7XLh_GdW2253dwwNx4vAsvPiuLUEMhFqDB1Q@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: decoupling table and index vacuum
List pgsql-hackers
On Wed, Feb 9, 2022 at 6:18 PM Peter Geoghegan <pg@bowt.ie> wrote:
> You seem to be vastly underestimating the value in being able to
> spread out and reschedule the work, and manage costs more generally.

Hmm. I think you're vastly overestimating the extent to which it's
possible to spread out and reschedule the work. I don't know which of
us is wrong. From my point of view, if VACUUM is going to do a full
phase 1 heap pass and a full phase 2 heap pass on either side of
whatever index work it does, there is no way that things are going to
get that much more dynamic than they are today. And even if we didn't
do that, in order to make any progress setting LP_DEAD pointers to
LP_UNUSED, you have to vacuum the entire index, which might be BIG. It
would be great to have a lot of granularity here but it doesn't seem
achievable.

> > I was thinking along the lines of trying to figure out either a more
> > reliable count of dead tuples in the index, subtracting out whatever
> > we save by kill_prior_tuple and bottom-up vacuuming; or else maybe a
> > count of the subset of dead tuples that are likely not to get
> > opportunistically pruned in one way or another, if there's some way to
> > guess that.
>
> I don't know how to build something like that, since that works by
> understanding what's working, not by noticing that some existing
> strategy plainly isn't working. The only positive information that I have
> confidence in is the extreme case where you have zero index growth.
> Which is certainly possible, but perhaps not that interesting with a
> real workload.
>
> There are emergent behaviors with bottom-up deletion. Purely useful
> behaviors, as far as I know, but still very hard to precisely nail
> down. For example, Victor Yegorov came up with an adversarial
> benchmark [1] that showed that the technique dealt with index bloat
> from queue-like inserts and deletes that recycled the same distinct
> key values over time, since they happened to be mixed with non-hot
> updates. It dealt very well with that, even though *I had no clue*
> that it would work *at all*, and might have even incorrectly predicted
> the opposite if Victor had asked about it in advance.

I don't understand what your point is in these two paragraphs. I'm
just arguing that, if a raw dead tuple count is meaningless because a
lot of them are going to disappear harmlessly with or without vacuum,
it's reasonable to try to get around that problem by counting the
subset of dead tuples where that isn't true. I agree that it's unclear
how to do that, but that doesn't mean that it can't be done.

> > I realize I'm
> > hand-waving, but if the property is a property of the heap rather than
> > the index, how will different indexes get different treatment?
>
> Maybe by making the primary key growth an indicator of what is
> reasonable for the other indexes (or other B-Tree indexes) -- it has a
> natural tendency to be the least bloated possible index. If you have
> something like a GiST index, or if you have a B-Tree index that
> constantly gets non-HOT updates that logically modify an indexed
> column, then it should become reasonably obvious. Maybe there'd be
> some kind of feedback behavior to lock in "bloat prone index" for a
> time.

I have the same concern about this as what I mentioned before: it's
purely retrospective. Therefore in my mind it's a very reasonable
choice for a backstop, but not a somewhat unsatisfying choice for a
primary mechanism.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [RFC] building postgres with meson
Next
From: Robert Haas
Date:
Subject: Re: decoupling table and index vacuum