Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Robert Haas
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CA+Tgmobf4OSYHAortvwJ1RteJH6u0yiX2EK+aac8Z7MkpDK2KA@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Mar 17, 2021 at 11:23 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Most anti-wraparound VACUUMs are really not emergencies, though.

That's true, but it's equally true that most of the time it's not
necessary to wear a seatbelt to avoid personal injury. The difficulty
is that it's hard to predict on which occasions it is necessary, and
therefore it is advisable to do it all the time. autovacuum decides
whether an emergency exists, in the first instance, by comparing
age(relfrozenxid) to autovacuum_freeze_max_age, but that's problematic
for at least two reasons. First, what matters is not when the vacuum
starts, but when the vacuum finishes. A user who has no tables larger
than 100MB can set autovacuum_freeze_max_age a lot closer to the high
limit without risk of hitting it than a user who has a 10TB table. The
time to run vacuum is dependent on both the size of the table and the
applicable cost delay settings, none of which autovacuum knows
anything about. It also knows nothing about the XID consumption rate.
It's relying on the user to set autovacuum_freeze_max_age low enough
that all the anti-wraparound vacuums will finish before the system
crashes into a wall. Second, what happens to one table affects what
happens to other tables. Even if you have perfect knowledge of your
XID consumption rate and the speed at which vacuum will complete, you
can't just configure autovacuum_freeze_max_age to allow exactly enough
time for the vacuum to complete once it hits the threshold, unless you
have one autovacuum worker per table so that the work for that table
never has to wait for work on any other tables. And even then, as you
mention, you have to worry about the possibility that a vacuum was
already in progress on that table itself. Here again, we rely on the
user to know empirically how high they can set
autovacuum_freeze_max_age without cutting it too close.

Now, that's not actually a good thing, because most users aren't smart
enough to do that, and will either leave a gigantic safety margin that
they don't need, or will leave an inadequate safety margin and take
the system down. However, it means we need to be very, very careful
about hard-coded thresholds like 90% of the available XID space. I do
think that there is a case for triggering emergency extra safety
measures when things are looking scary. One that I think would help a
tremendous amount is to start ignoring the vacuum cost delay when
wraparound danger (and maybe even bloat danger) starts to loom.
Perhaps skipping index vacuuming is another such measure, though I
suspect it would help fewer people, because in most of the cases I
see, the system is throttled to use a tiny percentage of its actual
hardware capability. If you're running at 1/5 of the speed of which
the hardware is capable, you can only do better by skipping index
cleanup if that skips more than 80% of page accesses, which could be
true but probably isn't. In reality, I think we probably want both
mechanisms, because they complement each other. If one can save 3X and
the other 4X, the combination is a 12X improvement, which is a big
deal. We might want other things, too.

But ... should the thresholds for triggering these kinds of mechanisms
really be hard-coded with no possibility of being configured in the
field? What if we find out after the release is shipped that the
mechanism works better if you make it kick in sooner, or later, or if
it depends on other things about the system, which I think it almost
certainly does? Thresholds that can't be changed without a recompile
are bad news. That's why we have GUCs.

On another note, I cannot say enough bad things about the function
name two_pass_strategy(). I sincerely hope that you're not planning to
create a function which is a major point of control for VACUUM whose
name gives no hint that it has anything to do with vacuum.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: GROUP BY DISTINCT
Next
From: Michael Paquier
Date:
Subject: Re: a verbose option for autovacuum