Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-WzkU42GzrsHhL2BiC1QMhaVGmVdb5HR0_qczz0Gu2aSn=A@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
(Justin Pryzby <pryzby@telsasoft.com>)
|
List | pgsql-hackers |
On Thu, Sep 8, 2022 at 1:23 PM Peter Geoghegan <pg@bowt.ie> wrote: > It might make sense to go further in the same direction by making > "regular vs aggressive/antiwraparound" into a *strict* continuum. In > other words, it might make sense to get rid of the two remaining cases > where VACUUM conditions its behavior on whether this VACUUM operation > is antiwraparound/aggressive or not. I decided to go ahead with this in the attached revision, v5. This revision totally gets rid of the general concept of discrete aggressive/non-aggressive modes for each VACUUM operation (see "v5-0004-Make-VACUUM-s-aggressive-behaviors-continuous.patch" and its commit message). My new approach turned out to be simpler than the previous half measures that I described as "unifying aggressive and antiwraparound" (which itself first appeared in v3). I now wish that I had all of these pieces in place for v1, since this was the direction I was thinking of all along -- that might have made life easier for reviewers like Jeff. What we have in v5 is what I had in mind all along, which turns out to have only a little extra code anyway. It might have been less confusing if I'd started this thread with something like v5 -- the story I need to tell would have been simpler that way. This is pretty much the end point I had in mind. Note that we still retain what were previously "aggressive only" behaviors. We only remove "aggressive" as a distinct mode of operation that exclusively applies the aggressive behaviors. We're now selective in how we apply each of the behaviors, based on the needs of the table. We want to behave in a way that's proportionate to the problem at hand, which is made easy by not tying anything to a discrete mode of operation. It's a false dichotomy; why should we ever have only one reason for running VACUUM, that's determined up front? There are still antiwraparound autovacuums in v5, but that is really just another way that autovacuum can launch an autovacuum worker (much like it was before the introduction of the visibility map in 8.4) -- both conceptually, and in terms of how the code works in vacuumlazy.c. In practice an antiwraparound autovacuum is guaranteed to advance relfrozenxid in roughly the same way as on HEAD (otherwise what's the point?), but that doesn't make the VACUUM operation itself special in any way. Besides, antiwraparound autovacuums will naturally be rare, because there are many more opportunities for a VACUUM to advance relfrozenxid "early" now (only "early" relative to how it would work on early Postgres versions). It's already clear that having antiwraparound autovacuums and aggressive mode VACUUMs as two separate concepts that are closely associated has some problems [1]. Formally making antiwraparound autovacuums just another way to launch a VACUUM via autovacuum seems quite useful to me. For the most part users are expected to just take relfrozenxid advancement for granted now. They should mostly be able to assume that VACUUM will do whatever is required to keep it sufficiently current over time. They can influence VACUUM's behavior, but that mostly works at the level of the table (not the level of any individual VACUUM operation). The freezing and skipping strategy stuff should do what is necessary to keep up in the long run. We don't want to put too much emphasis on relfrozenxid in the short run, because it isn't a reliable proxy for how we've kept up with the physical work of freezing -- that's what really matters. It should be okay to "fall behind on table age" in the short run, provided we don't fall behind on the physical work of freezing. Those two things shouldn't be conflated. We now use a separate pair of XID/MXID-based cutoffs to determine whether or not we're willing to wait for a cleanup lock the hard way (which can happen in any VACUUM, since of course there is no longer any special VACUUM with special behaviors). The new pair of cutoffs replace the use of FreezeLimit/MultiXactCutoff by lazy_scan_noprune (those are now only used to decide on what to freeze inside lazy_scan_prune). Same concept, but with a different, independent timeline. This was necessary just to get an existing isolation test (vacuum-no-cleanup-lock) to continue to work. But it just makes sense to have a different timeline for a completely different behavior. And it'll be more robust. It's a really bad idea for VACUUM to try to wait indefinitely long for a cleanup lock, since that's totally outside of its control. It typically won't take very long at all for VACUUM to acquire a cleanup lock, of course, but that is beside the point -- who really cares what's true on average, for something like this? Sometimes it'll take hours to acquire a cleanup lock, and there is no telling when that might happen! And so pausing VACUUM/freezing of all other pages just to freeze one page makes little sense. Waiting for a cleanup lock before we really need to is just an overreaction, which risks making the situation worse. The cure must not be worse than the disease. This revision also resolves problems with freezing MultiXactIds too lazily [2]. We now always trigger page level freezing in the event of encountering a Multi. This is more consistent with the behavior on HEAD, where we can easily process a Multi well before the cutoff represented by vacuum_multixact_freeze_min_age (e.g., we notice that a Multi has no members still running, making it safe to remove before the cutoff is reached). Also attaching a prebuilt copy of the "routine vacuuming" docs as of v5. This is intended to be a convenience for reviewers, or anybody with a general interest in the patch series. The docs certainly still need work, but I feel that I'm making progress on that side of things (especially in this latest revision). Making life easier for DBAs is the single most important goal of this work, so the user docs are of central importance. The current "Routine Vacuuming" docs have lots of problems, but to some extent the problems are with the concepts themselves. [1] https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com [2] https://postgr.es/m/CAH2-Wz=+B5f1izRDPYKw+sUgOr6=AkWXp2NikU5cub0ftbRQhA@mail.gmail.com -- Peter Geoghegan
Attachment
- routine-vacuuming.html
- v5-0001-Add-page-level-freezing-to-VACUUM.patch
- v5-0005-Avoid-allocating-MultiXacts-during-VACUUM.patch
- v5-0004-Make-VACUUM-s-aggressive-behaviors-continuous.patch
- v5-0003-Add-eager-freezing-strategy-to-VACUUM.patch
- v5-0006-Size-VACUUM-s-dead_items-space-using-VM-snapshot.patch
- v5-0002-Teach-VACUUM-to-use-visibility-map-snapshot.patch
pgsql-hackers by date: