Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-WznT4q9JOOU5q1ifzqyPWnw-Wbt9_8NWZ5xaNApsx9-EfA@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: decoupling table and index vacuum
List pgsql-hackers
On Wed, Feb 9, 2022 at 1:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not sure that we can. I mean, there's still only going to be ~3
> autovacuum workers, and there could be arbitrarily many tables. Even
> if the vacuum load is within the bounds of what the system can
> sustain, individual tables can't be assured of being visited
> frequently (or so it seems to me) and it could be that there are
> actually not enough resources to vacuum and have to try to cope as
> best we can. Less unnecessary vacuuming of large indexes can help, of
> course, but I'm not sure it fundamentally changes the calculus.

You seem to be vastly underestimating the value in being able to
spread out and reschedule the work, and manage costs more generally.
If you can multiplex autovacuum workers across tables, by splitting up
work across a table's index over time, then it might not matter at all
that you only have 3 workers. If you can spread out the work over
time, then you make things much cheaper (fewer FPIs by aligning to
checkpoint boundaries). And, because you have a schedule that can be
dynamically updated, you get to update your global view of the world
(not just one table) before you've fully committed to it -- if you
provisionally say that you think that a certain index won't need to be
vacuumed for a long time, that isn't the last word anymore.

Costs are paid by the whole system, but benefits only go to individual
tables and indexes. Being able to manage costs over time with a sense
of the benefits, and a sense of high level priorities will be *huge*
for us. Managing debt at the level of the entire system (not just one
table or index) is also really important. (Though maybe we should just
focus on the v1, just because that's what is needed right now.)

> > We will need something like that. I think that LP_DEAD items (or
> > would-be LP_DEAD items -- tuples with storage that would get pruned
> > into LP_DEAD items if we were to prune) in the table are much more
> > interesting than dead heap-only tuples, and also more interesting that
> > dead index tuples. Especially the distribution of such LP_DEAD items
> > in the table, and their concentration. That does seem much more likely
> > to be robust as a quantitative driver of index vacuuming.
>
> Hmm... why would the answer have to do with dead items in the heap?

We're eventually going to have to make the LP_DEAD items LP_UNUSED
anyway here. So we might as well get started on that, with the index
that we *also* think is the one that might need it the most, for its
own reasons. We're making a decision on the basis of multiple factors,
knowing that in the worst case (when the index really didn't need
anything at all) we will have at least had the benefit of doing some
actually-useful work sooner rather than later. We should probably
consider multiple reasons to do any unit of work.

> I was thinking along the lines of trying to figure out either a more
> reliable count of dead tuples in the index, subtracting out whatever
> we save by kill_prior_tuple and bottom-up vacuuming; or else maybe a
> count of the subset of dead tuples that are likely not to get
> opportunistically pruned in one way or another, if there's some way to
> guess that.

I don't know how to build something like that, since that works by
understanding what's working, not by noticing that some existing
strategy plainly isn't working. The only positive information that I have
confidence in is the extreme case where you have zero index growth.
Which is certainly possible, but perhaps not that interesting with a
real workload.

There are emergent behaviors with bottom-up deletion. Purely useful
behaviors, as far as I know, but still very hard to precisely nail
down. For example, Victor Yegorov came up with an adversarial
benchmark [1] that showed that the technique dealt with index bloat
from queue-like inserts and deletes that recycled the same distinct
key values over time, since they happened to be mixed with non-hot
updates. It dealt very well with that, even though *I had no clue*
that it would work *at all*, and might have even incorrectly predicted
the opposite if Victor had asked about it in advance.

> I realize I'm
> hand-waving, but if the property is a property of the heap rather than
> the index, how will different indexes get different treatment?

Maybe by making the primary key growth an indicator of what is
reasonable for the other indexes (or other B-Tree indexes) -- it has a
natural tendency to be the least bloated possible index. If you have
something like a GiST index, or if you have a B-Tree index that
constantly gets non-HOT updates that logically modify an indexed
column, then it should become reasonably obvious. Maybe there'd be
some kind of feedback behavior to lock in "bloat prone index" for a
time.

If we can bring costs into it too (e.g., spreading out the burden of
index vacuuming over time), then it becomes acceptable to incorrectly
determine which index needed special attention. We will still remember
that that one index has been vacuumed up to a certain point, which is
still useful -- that work would have to have been completed either
way, so it's really no real loss. Plus we've spread the burden out
over time, which is always useful. The cost control stuff could easily
more than make up for the fact that we don't have a mythical perfect
model that always knows exactly what to do, when, based on the needs
of indexes.

I think that expanding the scope to cover cost management actually
makes this project easier, not harder. Costs really matter, and are
much easier to understand. Cost control makes it okay to guess about
benefits for the index/queries and be wrong.

[1] https://www.postgresql.org/message-id/CAGnEbogATZS1mWMVX8FzZHMXzuDEcb10AnVwwhCtXtiBpg3XLQ@mail.gmail.com
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: catalog access with reset GUCs during parallel worker startup
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] nodeindexscan with reorder memory leak