Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CA+Tgmobf4OSYHAortvwJ1RteJH6u0yiX2EK+aac8Z7MkpDK2KA@mail.gmail.com Whole thread Raw |
In response to | Re: New IndexAM API controlling index vacuum strategies (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
On Wed, Mar 17, 2021 at 11:23 PM Peter Geoghegan <pg@bowt.ie> wrote: > Most anti-wraparound VACUUMs are really not emergencies, though. That's true, but it's equally true that most of the time it's not necessary to wear a seatbelt to avoid personal injury. The difficulty is that it's hard to predict on which occasions it is necessary, and therefore it is advisable to do it all the time. autovacuum decides whether an emergency exists, in the first instance, by comparing age(relfrozenxid) to autovacuum_freeze_max_age, but that's problematic for at least two reasons. First, what matters is not when the vacuum starts, but when the vacuum finishes. A user who has no tables larger than 100MB can set autovacuum_freeze_max_age a lot closer to the high limit without risk of hitting it than a user who has a 10TB table. The time to run vacuum is dependent on both the size of the table and the applicable cost delay settings, none of which autovacuum knows anything about. It also knows nothing about the XID consumption rate. It's relying on the user to set autovacuum_freeze_max_age low enough that all the anti-wraparound vacuums will finish before the system crashes into a wall. Second, what happens to one table affects what happens to other tables. Even if you have perfect knowledge of your XID consumption rate and the speed at which vacuum will complete, you can't just configure autovacuum_freeze_max_age to allow exactly enough time for the vacuum to complete once it hits the threshold, unless you have one autovacuum worker per table so that the work for that table never has to wait for work on any other tables. And even then, as you mention, you have to worry about the possibility that a vacuum was already in progress on that table itself. Here again, we rely on the user to know empirically how high they can set autovacuum_freeze_max_age without cutting it too close. Now, that's not actually a good thing, because most users aren't smart enough to do that, and will either leave a gigantic safety margin that they don't need, or will leave an inadequate safety margin and take the system down. However, it means we need to be very, very careful about hard-coded thresholds like 90% of the available XID space. I do think that there is a case for triggering emergency extra safety measures when things are looking scary. One that I think would help a tremendous amount is to start ignoring the vacuum cost delay when wraparound danger (and maybe even bloat danger) starts to loom. Perhaps skipping index vacuuming is another such measure, though I suspect it would help fewer people, because in most of the cases I see, the system is throttled to use a tiny percentage of its actual hardware capability. If you're running at 1/5 of the speed of which the hardware is capable, you can only do better by skipping index cleanup if that skips more than 80% of page accesses, which could be true but probably isn't. In reality, I think we probably want both mechanisms, because they complement each other. If one can save 3X and the other 4X, the combination is a 12X improvement, which is a big deal. We might want other things, too. But ... should the thresholds for triggering these kinds of mechanisms really be hard-coded with no possibility of being configured in the field? What if we find out after the release is shipped that the mechanism works better if you make it kick in sooner, or later, or if it depends on other things about the system, which I think it almost certainly does? Thresholds that can't be changed without a recompile are bad news. That's why we have GUCs. On another note, I cannot say enough bad things about the function name two_pass_strategy(). I sincerely hope that you're not planning to create a function which is a major point of control for VACUUM whose name gives no hint that it has anything to do with vacuum. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: