Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CA+TgmobrGs0c2W7XLh_GdW2253dwwNx4vAsvPiuLUEMhFqDB1Q@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: decoupling table and index vacuum
|
List | pgsql-hackers |
On Wed, Feb 9, 2022 at 6:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > You seem to be vastly underestimating the value in being able to > spread out and reschedule the work, and manage costs more generally. Hmm. I think you're vastly overestimating the extent to which it's possible to spread out and reschedule the work. I don't know which of us is wrong. From my point of view, if VACUUM is going to do a full phase 1 heap pass and a full phase 2 heap pass on either side of whatever index work it does, there is no way that things are going to get that much more dynamic than they are today. And even if we didn't do that, in order to make any progress setting LP_DEAD pointers to LP_UNUSED, you have to vacuum the entire index, which might be BIG. It would be great to have a lot of granularity here but it doesn't seem achievable. > > I was thinking along the lines of trying to figure out either a more > > reliable count of dead tuples in the index, subtracting out whatever > > we save by kill_prior_tuple and bottom-up vacuuming; or else maybe a > > count of the subset of dead tuples that are likely not to get > > opportunistically pruned in one way or another, if there's some way to > > guess that. > > I don't know how to build something like that, since that works by > understanding what's working, not by noticing that some existing > strategy plainly isn't working. The only positive information that I have > confidence in is the extreme case where you have zero index growth. > Which is certainly possible, but perhaps not that interesting with a > real workload. > > There are emergent behaviors with bottom-up deletion. Purely useful > behaviors, as far as I know, but still very hard to precisely nail > down. For example, Victor Yegorov came up with an adversarial > benchmark [1] that showed that the technique dealt with index bloat > from queue-like inserts and deletes that recycled the same distinct > key values over time, since they happened to be mixed with non-hot > updates. It dealt very well with that, even though *I had no clue* > that it would work *at all*, and might have even incorrectly predicted > the opposite if Victor had asked about it in advance. I don't understand what your point is in these two paragraphs. I'm just arguing that, if a raw dead tuple count is meaningless because a lot of them are going to disappear harmlessly with or without vacuum, it's reasonable to try to get around that problem by counting the subset of dead tuples where that isn't true. I agree that it's unclear how to do that, but that doesn't mean that it can't be done. > > I realize I'm > > hand-waving, but if the property is a property of the heap rather than > > the index, how will different indexes get different treatment? > > Maybe by making the primary key growth an indicator of what is > reasonable for the other indexes (or other B-Tree indexes) -- it has a > natural tendency to be the least bloated possible index. If you have > something like a GiST index, or if you have a B-Tree index that > constantly gets non-HOT updates that logically modify an indexed > column, then it should become reasonably obvious. Maybe there'd be > some kind of feedback behavior to lock in "bloat prone index" for a > time. I have the same concern about this as what I mentioned before: it's purely retrospective. Therefore in my mind it's a very reasonable choice for a backstop, but not a somewhat unsatisfying choice for a primary mechanism. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: