Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-Wzn9KRf3rBSDkYsK3tvW2JegLfqCCG2Kwi4Ay9QVmhwPSA@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
Re: New strategies for freezing, advancing relfrozenxid early |
List | pgsql-hackers |
On Tue, Dec 6, 2022 at 1:45 PM Peter Geoghegan <pg@bowt.ie> wrote: > v9 will also address some of the concerns you raised in your review > that weren't covered by v8, especially about the VM snapshotting > infrastructure. But also your concerns about the transition from lazy > strategies to eager strategies. Attached is v9. Highlights: * VM snapshot infrastructure now spills using temp files when required (only in larger tables). v9 is the first version that has a credible approach to resource management, which was something I put off until recently. We only use a fixed amount of memory now, which should be acceptable from the viewpoint of VACUUM resource management. The temp files use the BufFile infrastructure in a relatively straightforward way. * VM snapshot infrastructure now uses explicit prefetching. Our approach is straightforward, and perhaps even obvious: we prefetch at the point that VACUUM requests the next block in line. There is a configurable prefetch distance, controlled by maintenance_io_concurrency. We "stage" a couple of thousand BlockNumbers in VACUUM's vmsnap by bulk-reading from the vmsnap's local copy of the visibility map -- these staged blocks are returned to VACUUM to scan, with interlaced prefetching of later blocks from the same local BlockNumber array. The addition of prefetching ought to be enough to avoid regressions that might otherwise result from the removal of SKIP_PAGES_THRESHOLD from vacuumlazy.c (see commit bf136cf6 from around the time the visibility map first went in for the full context). While I definitely need to do more performance validation work around prefetching (especially on high latency network-attached storage), I imagine that it won't be too hard to get into shape for commit. It's certainly not committable yet, but it's vastly better than v8. The visibility map snapshot interface (presented by visibilitymap.h) also changed in v9, mostly to support prefetching. We now have an iterator style interface (so vacuumlazy.c cannot request random access). This iterator interface is implemented by visibilitymap.c using logic similar to the current lazy_scan_skip() logic from vacuumlazy.c (which is gone). All told, visibilitymap.c knows quite a bit more than it used to about high level requirements from vacuumlazy.c. For example it has explicit awareness of VM skipping strategies. * Page-level freezing commit now freezes a page whenever VACUUM detects that pruning ran and generated an FPI. Following a suggestion by Andres, page-level freezing is now always triggered when pruning needs an FPI. Note that this optimization gets applied regardless of freezing strategy (unless you turn off full_page_writes, I suppose). This optimization is added by the second patch (v9-0002-Add-page-level-freezing-to-VACUUM.patch). * Fixed the doc build. * Much improved criteria for deciding on freezing and vmsnap skipping strategies in vacuumlazy.c lazy_scan_strategy function -- improved "cost model". VACUUM should now give users a far smoother "transition" from lazy processing to eager processing. A table that starts out small (smaller than vacuum_freeze_strategy_threshold), but gradually grows, and eventually becomes fairly large (perhaps to a multiple of vacuum_freeze_strategy_threshold in size) will now experience a far more gradual transition, with catch-up freezing spread out multiple VACUUM operations. We avoid big jumps in the overhead of freezing, where one particular VACUUM operation does all required "catch-up freezing" in one go. My approach is to "stagger" the timeline for switching freezing strategy and vmsnap skipping strategy. We now change over from lazy to eager freezing strategy when the table size threshold (controlled by vacuum_freeze_strategy_threshold) is first crossed, just like in v8. But unlike v8, v9 will switch over to eager skipping in some later VACUUM operation (barring edge cases). This is implemented in a fairly simple way: we now apply a "separate" threshold that is based on vacuum_freeze_strategy_threshold: a threshold that's *twice* the current value of the vacuum_freeze_strategy_threshold GUC/reloption threshold. My approach of "staggering" multiple distinct behaviors to avoid having them all kick in during the same VACUUM operation isn't new to v9. The behavior around waiting for cleanup locks (added by v9-0005-Finish-removing-aggressive-mode-VACUUM.patch) is another example of the same general idea. In general I think that VACUUM shouldn't switch to more aggressive behaviors all at the same time, in the same VACUUM. Each distinct aggressive behavior has totally different properties, so there is no reason why VACUUM should start to apply each and every one of them at the same time. Some "aggressive" behaviors have the potential to make things quite a lot worse, in fact. The cure must not be worse than the disease. * Related to the previous item (about the "cost model" that chooses a strategy), we now have a much more sophisticated approach when it comes to when and how we decide to advance relfrozenxid in smaller tables (tables whose size is < vacuum_freeze_strategy_threshold). This improves things for tables that start out small, and stay small. Tables where we're unlikely to want to advance relfrozenxid in every single VACUUM (better to be lazy with such a table), but still want to be clever about advancing relfrozenxid "opportunistically". The way that VACUUM weighs both table age and the added cost of relfrozenxid advancement is more sophisticated in v9. The goal is to make it more likely that VACUUM will stumble upon opportunities to advance relfrozenxid when it happens to be cheap, which can happen for many reasons. All of which have a great deal to do with workload characteristics. As in v8, v9 makes VACUUM willing to advance relfrozenxid without concern for table age, whenever it notices that the cost of doing so happens to be very cheap (in practice this means that the number of "extra" heap pages scanned is < 5% of rel_pages). However, in v9 we now go further by scaling this threshold through interpolation, based on table age. We have the same "5% of rel_pages" threshold when table age is less than half way towards the point that autovacuum.c will launch an antiwraparound autovacuum -- when we still have only minimal concern about table age. But the rel_pages-wise threshold starts to grow once table age gets past that "half way towards antiwrap AV" point. We interpolate the rel_pages-wise threshold using a new approach in v9. At first the rel_pages-wise threshold grows quite slowly (relative to the rate at which table age approaches the point of forcing an antiwraparound AV). For example, when we're 60% of the way towards needing an antiwraparound AV, and VACUUM runs, we'll eagerly advance relfrozenxid provided that the "extra" cost of doing so happens to be less than ~22% of rel_pages. It "accelerates" from there (assuming fixed rel_pages). VACUUM will now tend to take advantage of individual table characteristics that make it relatively cheap to advance relfrozenxid. Bear in mind that these characteristics are not fixed for the same table. The "extra" cost of advancing relfrozenxid during this VACUUM (whether measured in absolute terms, of as a proportion of the net amount of work just to do simple vacuuming) just isn't predictable with real workloads. Especially not with the FPI opportunistic freezing stuff from the second patch (the "freeze when heap pruning gets an FPI" thing) in place. We should expect significant "natural variation" among tables, and within the same table over time -- this is a good thing. For example, imagine a table that experiences a bunch of random deletes, which leads to a VACUUM that must visit most heap pages (say 85% of rel_pages). Let's suppose that those deletes are a once-off thing. The added cost of advancing relfrozenxid in the next VACUUM still isn't trivial (assuming the remaining 15% of pages are all-visible). But it is probably still worth doing if table age is at least starting to become a concern. It might actually be a lot cheaper to advance relfrozenxid early. * Numerous structural improvements, lots of code polishing. The patches have been reordered in a way that should make review a bit easier. Now the commit messages are written in a way that clearly anticipates the removal of aggressive mode VACUUM, which the last patch actually finishes. Most of the earlier commits are presented as preparation for completely removing aggressive mode VACUUM. The first patch (which refactors how VACUUM passes around cutoffs like FreezeLimit and OldestXmin by using a dedicated struct) is much improved. heap_prepare_freeze_tuple() now takes a more explicit approach to tracking what needs to happen for the tuple's freeze plan. This allowed me to pepper it with defensive assertions. It's also a lot clearer IMV. For example, we now have separate freeze_xmax and replace_xmax tracker variables. The second patch in the series (the page-level freezing patch) is also much improved. I'm much happier with the way that heap_prepare_freeze_tuple() now explicitly delegates control of page-level freezing to FreezeMultiXactId() in v9, for example. Note that I squashed the patch that taught VACUUM to size dead_items using scanned_pages into the main visibility map patch (v9-0004-Add-eager-and-lazy-VM-strategies-to-VACUUM.patch). That's why there are only 5 patches (down from 6) in v9. -- Peter Geoghegan
Attachment
pgsql-hackers by date: