Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-WzmfwcsUim032V4DQVdDJ4tdH1_HXTSfWoFhPKaR-kHJKg@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Thu, Feb 10, 2022 at 11:14 AM Robert Haas <robertmhaas@gmail.com> wrote: > Hmm. I think you're vastly overestimating the extent to which it's > possible to spread out and reschedule the work. I don't know which of > us is wrong. From my point of view, if VACUUM is going to do a full > phase 1 heap pass and a full phase 2 heap pass on either side of > whatever index work it does, there is no way that things are going to > get that much more dynamic than they are today. Waiting to vacuum each index allows us to wait until the next VACUUM operation on the table, giving us more TIDs to remove when we do go to vacuum one of these large indexes. Making decisions dynamically seems very promising because it gives us the most flexibility. In principle the workload might not allow for any of that, but in practice I think that it will. > I don't understand what your point is in these two paragraphs. I'm > just arguing that, if a raw dead tuple count is meaningless because a > lot of them are going to disappear harmlessly with or without vacuum, > it's reasonable to try to get around that problem by counting the > subset of dead tuples where that isn't true. I agree that it's unclear > how to do that, but that doesn't mean that it can't be done. VACUUM is a participant in the system -- it sees how physical relations are affected by the workload, but it also sees how physical relations are affected by previous VACUUM operations. If it goes to VACUUM an index on the basis of a relatively small difference (that might just be noise), and does so systematically and consistently, that might have unintended consequences. In particular, we might do the wrong thing, again and again, because we're overinterpreting noise again and again. > I have the same concern about this as what I mentioned before: it's > purely retrospective. Therefore in my mind it's a very reasonable > choice for a backstop, but not a somewhat unsatisfying choice for a > primary mechanism. I'm not saying that it's impossible or even unreasonable to do something based on the current or anticipated state of the index, exactly. Just that you have to be realistic about how accurate that model is going to be in practice. In practice it'll be quite noisy, and that must be accounted for. For example, we could deliberately coarsen the information, so that only relatively large differences in apparent-bloatedness are visible to the model. The other thing is that VACUUM itself cannot be expected to operate with all that much precision, just because of how it works at a high level. Any quantitative measure will only be meaningful as a way of prioritizing work. Which is going to be far easier by making the behavior dynamic, and continually reassessing. Once a relatively large difference among two indexes first emerges, we can be relatively confident about what to do. But smaller differences are likely just noise. -- Peter Geoghegan
pgsql-hackers by date: