Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CAH2-WznEkZT6mFSphn-8KfLhQFK+xEpV9a0mhLkvfvGbf2+t4g@mail.gmail.com Whole thread Raw |
In response to | Re: New IndexAM API controlling index vacuum strategies (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
On Thu, Mar 18, 2021 at 2:05 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Mar 17, 2021 at 11:23 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Most anti-wraparound VACUUMs are really not emergencies, though. > > That's true, but it's equally true that most of the time it's not > necessary to wear a seatbelt to avoid personal injury. The difficulty > is that it's hard to predict on which occasions it is necessary, and > therefore it is advisable to do it all the time. Just to be clear: This was pretty much the point I was making here -- although I guess you're making the broader point about autovacuum and freezing in general. The fact that we can *continually* reevaluate if an ongoing VACUUM is at risk of taking too long is entirely the point here. We can in principle end index vacuuming dynamically, whenever we feel like it and for whatever reasons occur to us (hopefully these are good reasons -- the point is that we get to pick and choose). We can afford to be pretty aggressive about not giving up, while still having the benefit of doing that when it *proves* necessary. Because: what are the chances of the emergency mechanism ending index vacuuming being the wrong thing to do if we only do that when the system clearly and measurably has no more than about 10% of the possible XID space to go before the system becomes unavailable for writes? What could possibly matter more than that? By making the decision dynamic, the chances of our threshold/heuristics causing the wrong behavior become negligible -- even though we're making the decision based on a tiny amount of (current, authoritative) information. The only novel risk I can think about is that somebody comes to rely on the mechanism saving the day, over and over again, rather than fixing a fixable problem. > autovacuum decides > whether an emergency exists, in the first instance, by comparing > age(relfrozenxid) to autovacuum_freeze_max_age, but that's problematic > for at least two reasons. First, what matters is not when the vacuum > starts, but when the vacuum finishes. To be fair the vacuum_set_xid_limits() mechanism that you refer to makes perfect sense. It's just totally insufficient for the reasons you say. > A user who has no tables larger > than 100MB can set autovacuum_freeze_max_age a lot closer to the high > limit without risk of hitting it than a user who has a 10TB table. The > time to run vacuum is dependent on both the size of the table and the > applicable cost delay settings, none of which autovacuum knows > anything about. It also knows nothing about the XID consumption rate. > It's relying on the user to set autovacuum_freeze_max_age low enough > that all the anti-wraparound vacuums will finish before the system > crashes into a wall. Literally nobody on earth knows what their XID burn rate is when it really matters. It might be totally out of control that one day of your life where it truly matters (e.g., due to a recent buggy code deployment, which I've seen up close). That's how emergencies work. A dynamic approach is not merely preferable. It seems essential. No top-down plan is going to be smart enough to predict that it'll take a really long time to get that one super-exclusive lock on relatively few pages. > Second, what happens to one table affects what > happens to other tables. Even if you have perfect knowledge of your > XID consumption rate and the speed at which vacuum will complete, you > can't just configure autovacuum_freeze_max_age to allow exactly enough > time for the vacuum to complete once it hits the threshold, unless you > have one autovacuum worker per table so that the work for that table > never has to wait for work on any other tables. And even then, as you > mention, you have to worry about the possibility that a vacuum was > already in progress on that table itself. Here again, we rely on the > user to know empirically how high they can set > autovacuum_freeze_max_age without cutting it too close. But the VM is a lot more useful when you effectively eliminate index vacuuming from the picture. And VACUUM has a pretty good understanding of how that works. Index vacuuming remains the achilles' heel, and I think that avoiding it in some cases has tremendous value. It has outsized importance now because we've significantly ameliorated the problems in the heap, by having the visibility map. What other factor can make VACUUM take 10x longer than usual on occasion? Autovacuum scheduling is essentially a top-down model of the needs of the system -- and one with a lot of flaws. IMV we can make the model's simplistic view of reality better by making the reality better (i.e. simpler, more tolerant of stressors) instead of making the model better. > Now, that's not actually a good thing, because most users aren't smart > enough to do that, and will either leave a gigantic safety margin that > they don't need, or will leave an inadequate safety margin and take > the system down. However, it means we need to be very, very careful > about hard-coded thresholds like 90% of the available XID space. I do > think that there is a case for triggering emergency extra safety > measures when things are looking scary. One that I think would help a > tremendous amount is to start ignoring the vacuum cost delay when > wraparound danger (and maybe even bloat danger) starts to loom. We've done a lot to ameliorate that problem in recent releases, simply by updating the defaults. > Perhaps skipping index vacuuming is another such measure, though I > suspect it would help fewer people, because in most of the cases I > see, the system is throttled to use a tiny percentage of its actual > hardware capability. If you're running at 1/5 of the speed of which > the hardware is capable, you can only do better by skipping index > cleanup if that skips more than 80% of page accesses, which could be > true but probably isn't. The proper thing for VACUUM to be throttled on these days is dirtying pages. Skipping index vacuuming and skipping the second pass over the heap will both make an enormous difference in many cases, precisely because they'll avoid dirtying nearly so many pages. Especially in the really bad cases, which are precisely where we see problems. Think about how many pages you'll dirty with a UUID-based index with regular churn from updates. Plus indexes don't have a visibility map. Whereas an append-mostly pattern is common with the largest tables. Perhaps it doesn't matter, but FWIW I think that you're drastically underestimating the extent to which index vacuuming is now the problem, in a certain important sense. I think that skipping index vacuuming and heap vacuuming (i.e. just doing the bare minimum, pruning) will in fact reduce the number of page accesses by 80% in many many cases. But I suspect it makes an even bigger difference in the cases where users are at most risk of wraparound related outages to begin with. ISTM that you're focussing too much on the everyday cases, the majority, which are not the cases where everything truly falls apart. The extremes really matter. Index vacuuming gets really slow when we're low on maintenance_work_mem -- horribly slow. Whereas that doesn't matter at all if you skip indexes. What do you think are the chances that that was a major factor in those sites that actually had an outage in the end? My intuition is that eliminating worst-case variability is the really important thing here. Heap vacuuming just doesn't have that multiplicative quality. Its costs tend to be proportionate to the workload, and stable over time. > But ... should the thresholds for triggering these kinds of mechanisms > really be hard-coded with no possibility of being configured in the > field? What if we find out after the release is shipped that the > mechanism works better if you make it kick in sooner, or later, or if > it depends on other things about the system, which I think it almost > certainly does? Thresholds that can't be changed without a recompile > are bad news. That's why we have GUCs. I'm fine with a GUC, though only for the emergency mechanism. The default really matters, though -- it shouldn't be necessary to tune (since we're trying to address a problem that many people don't know they have until it's too late). I still like 1.8 billion XIDs as the value -- I propose that that be made the default. > On another note, I cannot say enough bad things about the function > name two_pass_strategy(). I sincerely hope that you're not planning to > create a function which is a major point of control for VACUUM whose > name gives no hint that it has anything to do with vacuum. You always hate my names for things. But that's fine by me -- I'm usually not very attached to them. I'm happy to change it to whatever you prefer. FWIW, that name was intended to highlight that VACUUMs with indexes will now always use the two-pass strategy. This is not to be confused with the one-pass strategy, which is now strictly used on tables with no indexes -- this even includes the INDEX_CLEANUP=off case with the patch. -- Peter Geoghegan
pgsql-hackers by date: