Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-Wz=HTfWpmeXeCP-Q9CuWGbHeZW-j75tgtdL7B7e2O2TnBA@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
On Tue, Mar 29, 2022 at 10:03 AM Robert Haas <robertmhaas@gmail.com> wrote: > I can understand this version of the commit message. Woohoo! I like > understanding things. That's good news. > I think the header comments for FreezeMultiXactId() focus way too much > on what the caller is supposed to do and not nearly enough on what > FreezeMultiXactId() itself does. I think to some extent this also > applies to the comments within the function body. To some extent this is a legitimate difference in style. I myself don't think that it's intrinsically good to have these sorts of comments. I just think that it can be the least worst thing when a function is intrinsically written with one caller and one very specific set of requirements in mind. That is pretty much a matter of taste, though. > I think I understand what the first paragraph of the header comment > for heap_tuple_needs_freeze() is trying to say, but the second one is > quite confusing. I think this is again because it veers into talking > about what the caller should do rather than explaining what the > function itself does. I wouldn't have done it that way if the function wasn't called heap_tuple_needs_freeze(). I would be okay with removing this paragraph if the function was renamed to reflect the fact it now tells the caller something about the tuple having an old XID/MXID relative to the caller's own XID/MXID cutoffs. Maybe the function name should be heap_tuple_would_freeze(), making it clear that the function merely tells caller what heap_prepare_freeze_tuple() *would* do, without presuming to tell the vacuumlazy.c caller what it *should* do about any of the information it is provided. Then it becomes natural to see the boolean return value and the changes the function makes to caller's relfrozenxid/relminmxid tracker variables as independent. > I don't like the statement-free else block in lazy_scan_noprune(). I > think you could delete the else{} and just put that same comment there > with one less level of indentation. There's a clear "return false" > just above so it shouldn't be confusing what's happening. Okay, will fix. > The comment hunk at the end of lazy_scan_noprune() would probably be > better if it said something more specific than "caller can tolerate > reduced processing." My guess is that it would be something like > "caller does not need to do something or other." I meant "caller can tolerate not pruning or freezing this particular page". Will fix. > I have my doubts about whether the overwrite-a-future-relfrozenxid > behavior is any good, but that's a topic for another day. I suggest > keeping the words "it seems best to", though, because they convey a > level of tentativeness, which seems appropriate. I agree that it's best to keep a tentative tone here. That code was written following a very specific bug in pg_upgrade several years back. There was a very recent bug fixed only last year, by commit 74cf7d46. FWIW I tend to think that we'd have a much better chance of catching that sort of thing if we'd had better relfrozenxid instrumentation before now. Now you'd see a negative value in the "new relfrozenxid: %u, which is %d xids ahead of previous value" part of the autovacuum log message in the event of such a bug. That's weird enough that I bet somebody would notice and report it. > I am surprised to see you write in maintenance.sgml that the VACUUM > which most recently advanced relfrozenxid will typically be the most > recent aggressive VACUUM. I would have expected something like "(often > the most recent VACUUM)". That's always been true, and will only be slightly less true in Postgres 15 -- the fact is that we only need to skip one all-visible page to lose out, and that's not unlikely with tables that aren't quite small with all the patches from v12 applied (we're still much too naive). The work that I'll get into Postgres 15 on VACUUM is very valuable as a basis for future improvements, but not all that valuable to users (improved instrumentation might be the biggest benefit in 15, or maybe relminmxid advancement for certain types of applications). I still think that we need to do more proactive page-level freezing to make relfrozenxid advancement happen in almost every VACUUM, but even that won't quite be enough. There are still cases where we need to make a choice about giving up on relfrozenxid advancement in a non-aggressive VACUUM -- all-visible pages won't completely go away with page-level freezing. At a minimum we'll still have edge cases like the case where heap_lock_tuple() unsets the all-frozen bit. And pg_upgrade'd databases, too. 0002 structures the logic for skipping using the VM in a way that will make the choice to skip or not skip all-visible pages in non-aggressive VACUUMs quite natural. I suspect that SKIP_PAGES_THRESHOLD was always mostly just about relfrozenxid advancement in non-aggressive VACUUM, all along. We can do much better than SKIP_PAGES_THRESHOLD, especially if we preprocess the entire visibility map up-front -- we'll know the costs and benefits up-front, before committing to early relfrozenxid advancement. Overall, aggressive vs non-aggressive VACUUM seems like a false dichotomy to me. ISTM that it should be a totally dynamic set of behaviors. There should probably be several different "aggressive gradations''. Most VACUUMs start out completely non-aggressive (including even anti-wraparound autovacuums), but can escalate from there. The non-cancellable autovacuum behavior (technically an anti-wraparound thing, but really an aggressiveness thing) should be something we escalate to, as with the failsafe. Dynamic behavior works a lot better. And it makes scheduling of autovacuum workers a lot more straightforward -- the discontinuities seem to make that much harder, which is one more reason to avoid them altogether. -- Peter Geoghegan
pgsql-hackers by date: