Re: Defining (and possibly skipping) useless VACUUM operations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Defining (and possibly skipping) useless VACUUM operations |
Date | |
Msg-id | CAH2-WznHQWhzzJzTgXj2JTu2bpaw7toO3KsuK1UCa7mzNcEMRg@mail.gmail.com Whole thread Raw |
In response to | Re: Defining (and possibly skipping) useless VACUUM operations (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Tue, Dec 14, 2021 at 10:47 AM Robert Haas <robertmhaas@gmail.com> wrote: > Well I just don't understand why you insist on using the word > "skipping." I think what we're talking about - or at least what we > should be talking about - is whether relation_needs_vacanalyze() sets > *wraparound = true right after the comment that says /* Force vacuum > if table is at risk of wraparound */. And adding some kind of > exception to the logic that's there now. Actually, I agree. Skipping is the wrong term, especially because the phrase "VACUUM skips..." is already too overloaded. Not necessarily in vacuumlazy.c itself, but certainly on the mailing list. > Yeah, I hadn't thought about it from that perspective, but that does > seem very good. I think it's inevitable that there will be cases where > that doesn't work out - e.g. you can always force the bad case by > holding a table lock until your newborn heads off to college, or just > by overthrottling autovacuum so that it can't get through the database > in any reasonable amount of time - but it will be nice when it does > work out, for sure. Right. But when the patch doesn't manage to totally prevent anti-wraparound VACUUMs, things still work out a lot better than they would now. I would expect that in practice this will usually only happen when non-aggressive autovacuums keep getting canceled. And sure, it's still not ideal that things have come to that. But because we now do freezing earlier (when it's relatively inexpensive), and because we set all-frozen bits incrementally, the anti-wraparound autovacuum will at least be able to reuse any freezing that we manage to do in all those canceled autovacuums. I think that this tends to make anti-wraparound VACUUMs mostly about not being cancelable -- not so much about reliably advancing relfrozenxid. I mean it doesn't change the basic rules (there is no change to the definition of aggressive VACUUM), but in practice I think that it'll just work that way. Which makes a great deal of sense. I hope to be able to totally get rid of vacuum_freeze_table_age. The freeze map work in PostgreSQL 9.6 was really great, and very effective. But I think that it had an undesirable interaction with vacuum_freeze_min_age: if we set a heap page as all-visible (but not all frozen) before some of its tuples reached that age (which is very likely), then tuples < vacuum_freeze_min_age aren't going to get frozen until whenever we do an aggressive autovacuum. Very often, this will only happen when we next do an anti-wraparound VACUUM (at least before Postgres 13). I suspect we risk running into a "debt cliff" in the eventual anti-wraparound autovacuum. And so while vacuum_freeze_min_age kinda made sense prior to 9.6, it now seems to make a lot less sense. > > I guess that that makes avoiding useless vacuuming seem like less of a > > priority. ISTM that it should be something that is squarely aimed at > > keeping things stable in truly pathological cases. > > Yes. I think "pathological cases" is a good summary of what's wrong > with autovacuum. This is 100% my focus, in general. The main goal of the patch I'm working on isn't so much improving performance as making it more predictable over time. Focussing on freezing while costs are low has a natural tendency to spread the costs out over time. The system should never "get in over its head" with debt that vacuum is expected to eventually deal with. > When there's nothing too crazy happening, it actually > does pretty well. But, when resources are tight or other corner cases > occur, really dumb things start to happen. So it's reasonable to think > about how we can install guard rails that prevent complete insanity. Another thing that I really want to stamp out is anything involving a tiny, seemingly-insignificant adverse event that has the potential to cause disproportionate impact over time. For example, right now a non-aggressive VACUUM will never be able to advance relfrozenxid when it cannot get a cleanup lock on one heap page. It's actually extremely unlikely that that should have much of any impact, at least when you determine the new relfrozenxid for the table intelligently. Not acquiring one cleanup lock on one heap page on a huge table should not have such an extreme impact. It's even worse when the systemic impact over time is considered. Let's say you only have a 20% chance of failing to acquire one or more cleanup locks during a non-aggressive autovacuum for a given large table, meaning that you'll fail to advance relfrozenxid in at least 20% of all non-aggressive autovacuums. I think that that might be a lot worse than it sounds, because the impact compounds over time -- I'm not sure that 20% is much worse than 60%, or much better than 5% (very hard to model it). But if we make the high-level, abstract idea of "aggressiveness" more of a continuous thing, and not something that's defined by sharp (and largely meaningless) XID-based cutoffs, we have every chance of nipping these problems in the bud (without needing to model much of anything). -- Peter Geoghegan
pgsql-hackers by date: