Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-Wzkvirvn0vL_bjzcejdwJM+2QKFu31FPs73JKaF+-+AmyQ@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Fri, Apr 23, 2021 at 8:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 22, 2021 at 4:52 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Mostly what I'm saying is that I would like to put together a rough > > list of things that we could do to improve VACUUM along the lines > > we've discussed -- all of which stem from $SUBJECT. There are > > literally dozens of goals (some of which are quite disparate) that we > > could conceivably set out to pursue under the banner of $SUBJECT. > > I hope not. I don't have a clue why there would be dozens of possible > goals here, or why it matters. Not completely distinct goals, for the most part, but I can certainly see dozens of benefits. For example, if we know before index vacuuming starts that heap vacuuming definitely won't go ahead (quite possible when we decide that we're only vacuuming a subset of indexes), we can then tell the index AM about that fact. It can then safely vacuum in an "approximate" fashion, for example by skipping pages whose LSNs are from before the last VACUUM, and by not bothering with a super-exclusive lock in the case of nbtree. The risk of a conflict between this goal and another goal that we may want to pursue (which might be a bit contrived) is that we fail to do the right thing when a large range deletion has taken place, which must be accounted in the statistics, but creates a tension with the global index stuff. It's probably only safe to do this when we know that there have been hardly any DELETEs. There is also the question of how the TID map thing interacts with the visibility map, and how that affects how VACUUM behaves (both in general and in order to attain some kind of specific new benefit from this synergy). Who knows? We're never going to get on exactly the same page, but some rough idea of which page each of us are on might save everybody time. The stuff that I went into about making aborted transactions special as a means of decoupling transaction status management from garbage collection is arguably totally unrelated -- perhaps it's just too much of a stretch to link that to what you want to do now. I suppose it's hard to invest the time to engage with me on that stuff, and I wouldn't be surprised if you never did so. If it doesn't happen it would be understandable, though quite possibly a missed opportunity for both of us. My basic intuition there is that it's another variety of decoupling, so (for better or worse) it does *seem* related to me. (I am an intuitive thinker, which has advantages and disadvantages.) > I think if we're going to do something > like $SUBJECT, we should just concentrate on the best way to make that > particular change happen with minimal change to anything else. > Otherwise, we risk conflating this engineering effort with others that > really should be separate endeavors. Of course it's true that that is a risk. That doesn't mean that the opposite risk is not also a concern. I am concerned about both risks. I'm not sure which risk I should be more concerned about. I agree that we ought to focus on a select few goals as part of the first round of work in this area (without necessarily doing all or even most of them at the same time). It's not self-evident which goals those should be at this point, though. You've said that you're interested in global indexes. Okay, that's a start. I'll add the basic idea of not doing index vacuuming for some indexes and not others to the list -- this will necessitate that we teach index AMs to assess how much bloat the index has accumulated since the last VACUUM, which presumably must work in some generalized, composable way. > For example, as far as possible, > I think it would be best to try to do this without changing the > statistics that are currently gathered, and just make the best > decisions we can with the information we already have. I have no idea if that's the right way to do it. In any case the statistics that we gather influence the behavior of autovacuum.c, but nothing stops us from doing our own dynamic gathering of statistics to decide what we should do within vacuumlazy.c each time. We don't have to change the basic triggering conditions to change the work each VACUUM performs. As I've said before, I think that we're likely to get more benefit (at least at first) from making the actual reality of what VACUUM does simpler and more predictable in practice than we are from changing how reality is modeled inside autovacuum.c. I'll go further with that now: if we do change that modelling at some point, I think that it should work in an additive way, which can probably be compatible with how the statistics and so on work already. For example, maybe vacuumlazy.c asks autovacuum.c to do a VACUUM earlier next time. This can be structured as an exception to the general rule of autovacuum scheduling, probably -- something that occurs when it becomes evident that the generic schedule isn't quite cutting it in some important, specific way. > Ideally, I'd > like to avoid introducing a new kind of relation fork that uses a > different on-disk storage format (e.g. 16MB segments that are dropped > from the tail) rather than the one used by the other forks, but I'm > not sure we can get away with that, because conveyor-belt storage > looks pretty appealing here. No opinion on that just yet. > Regardless, the more we have to change to > accomplish the immediate goal, the more likely we are to introduce > instability into places where it could have been avoided, or to get > tangled up in endless bikeshedding. Certainly true. I'm not really trying to convince you of specific actionable points just yet, though. Perhaps that was the problem (or perhaps it simply led to miscommunication). It would be so much easier to discuss some of this stuff at an event like pgCon. Oh well. -- Peter Geoghegan
pgsql-hackers by date: