Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-Wzm_=mrWO+kUAJbR_gM_6RzpwVA8n8e4nh3dJGHdw_urew@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Andres Freund <andres@anarazel.de>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
|
List | pgsql-hackers |
On Tue, Nov 15, 2022 at 9:20 PM Andres Freund <andres@anarazel.de> wrote: > > Subject: [PATCH v6 1/6] Add page-level freezing to VACUUM. Attached is v7, which incorporates much of your feedback. Thanks for the review! > > +/* > > + * State used by VACUUM to track what the oldest extant XID/MXID will become > > + * when determing whether and how to freeze a page's heap tuples via calls to > > + * heap_prepare_freeze_tuple. > > Perhaps this could say something like "what the oldest extant XID/MXID > currently is and what it would be if we decide to freeze the page" or such? Fixed. > > + * The relfrozenxid_out and relminmxid_out fields are the current target > > + * relfrozenxid and relminmxid for VACUUM caller's heap rel. Any and all > > "VACUUM caller's heap rel." could stand to be rephrased. Fixed. > > + * unfrozen XIDs or MXIDs that remain in caller's rel after VACUUM finishes > > + * _must_ have values >= the final relfrozenxid/relminmxid values in pg_class. > > + * This includes XIDs that remain as MultiXact members from any tuple's xmax. > > + * Each heap_prepare_freeze_tuple call pushes back relfrozenxid_out and/or > > + * relminmxid_out as needed to avoid unsafe values in rel's authoritative > > + * pg_class tuple. > > + * > > + * Alternative "no freeze" variants of relfrozenxid_nofreeze_out and > > + * relminmxid_nofreeze_out must also be maintained for !freeze pages. > > + */ > > relfrozenxid_nofreeze_out isn't really a "no freeze variant" :) Why not? I think that that's exactly what it is. We maintain these alternative "oldest extant XID" values so that vacuumlazy.c's lazy_scan_prune function can "opt out" of freezing. This is exactly the same as what we do in lazy_scan_noprune, both conceptually and at the implementation level. > I think it might be better to just always maintain the nofreeze state. Not sure. Even if there is very little to gain in cycles by not maintaining the "nofreeze" cutoffs needlessly, it's still a pure waste of cycles that can easily be avoided. So it just feels natural to not waste those cycles -- it may even make the design clearer. > > +typedef struct HeapPageFreeze > > +{ > > + /* Is heap_prepare_freeze_tuple caller required to freeze page? */ > > + bool freeze; > > s/freeze/freeze_required/? Fixed. > Given the number of parameters to heap_prepare_freeze_tuple, why don't we pass > in more of them in via HeapPageFreeze? HeapPageFreeze is supposed to be mutable state used for one single page, though. Seems like we should use a separate immutable struct for this instead. I've already prototyped a dedicated immutable "cutoffs" struct, which is instantiated exactly once per VACUUM. Seems like a good approach to me. The immutable state can be shared by heapam.c's heap_prepare_freeze_tuple(), vacuumlazy.c, and even vacuum_set_xid_limits() -- so everybody can work off of the same struct directly. Will try to get that into shape for the next revision. > What does 'xtrack' stand for? Xid Tracking? Yes. > > * VACUUM caller must assemble HeapFreezeTuple entries for every tuple that we > > * returned true for when called. A later heap_freeze_execute_prepared call > > - * will execute freezing for caller's page as a whole. > > + * will execute freezing for caller's page as a whole. Caller should also > > + * initialize xtrack fields for page as a whole before calling here with first > > + * tuple for the page. See page_frozenxid_tracker comments. > > s/should/need to/? Changed it to "must". > page_frozenxid_tracker appears to be a dangling pointer. I think that you mean that the code comments reference an obsolete type name -- fixed. > > + * VACUUM calls limit_xid "FreezeLimit", and cutoff_xid "OldestXmin". > > + * (limit_multi is "MultiXactCutoff", and cutoff_multi "OldestMxact".) > > Hm. Perhaps we should just rename them if it requires this kind of > explanation? They're really not good names. Agreed -- this can be taken care of as part of using a new VACUUM operation level struct that is passed as immutable state, which I went into a moment ago. That centralizes the definitions, which makes it far easier to understand which cutoff is which. For now I've kept the names as they were. > Could use TransactionIdOlder(). I suppose, but the way I've done it feels a bit more natural to me, and appears more often elsewhere. Not sure. > > @@ -6563,8 +6564,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > > */ > > Assert(!freeze_xmax); > > Assert(TransactionIdIsValid(newxmax)); > > - if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) > > - *relfrozenxid_out = newxmax; > > + Assert(heap_tuple_would_freeze(tuple, limit_xid, limit_multi, > > + &xtrack->relfrozenxid_nofreeze_out, > > + &xtrack->relminmxid_nofreeze_out)); > > + if (TransactionIdPrecedes(newxmax, xtrack->relfrozenxid_out)) > > + xtrack->relfrozenxid_out = newxmax; > > Perhaps the Assert(heap_tuple_would_freeze()) bit could be handled once at the > end of the routine, for all paths? The problem with that is that we cannot Assert() when we're removing a Multi via FRM_INVALIDATE_XMAX processing in certain cases (I tried it this way myself, and the assertion fails there). This can happen when the call to FreezeMultiXactId() for the xmax determined that we should do FRM_INVALIDATE_XMAX processing for the xmax due to the Multi being "isLockOnly" and preceding "OldestVisibleMXactId[MyBackendId])". Which is relatively common. I fixed this by moving the assert further down, while still only checking the FRM_RETURN_IS_XID and FRM_RETURN_IS_MULTI cases. > Oh - I totally didn't realize that ->freeze is an out parameter. Seems a bit > odd to have the other fields suffied with _out but not this one? Fixed this by not having an "_out" suffix for any of these mutable fields from HeapPageFreeze. Now everything is consistent. (The "_out" convention is totally redundant, now that we have the HeapPageFreeze struct, which makes it obvious that it is is all mutable state.) > Won't using OldestXmin instead of FreezeLimit potentially cause additional > conflicts? Is there any reason to not compute an accurate value? This is a concern that I share. I was hoping that I'd be able to get away with using OldestXmin just for this, because it's simpler that way. But I had my doubts about it already. I wonder why it's correct to use FreezeLimit for this on HEAD, though. What about those FRM_INVALIDATE_XMAX cases that I just mentioned we couldn't Assert() on? That case effectively removes XIDs that might be well after FreezeLimit. Granted it might be safe in practice, but it's far from obvious why it is safe. Perhaps we can fix this in a not-too-invasive way by reusing LVPagePruneState.visibility_cutoff_xid for FREEZE_PAGE conflicts (not just VISIBLE conflicts) in cases where that was possible (while still using OldestXmin as a fallback in much rarer cases). In practice we're only triggering freezing eagerly because the page is already expected to be set all-visible (the whole point is that we'd prefer if it was set all-frozen instead of all-visible). (I've not done this in v7, but it's on my TODO list.) Note that the patch already maintains LVPagePruneState.visibility_cutoff_xid when there are some LP_DEAD items on the page, because we temporarily ignore those LP_DEAD items when considering the eager freezing stuff...... > > if (ItemIdIsDead(itemid)) > > { > > deadoffsets[lpdead_items++] = offnum; > > - prunestate->all_visible = false; > > - prunestate->has_lpdead_items = true; > > continue; > > } > > What does this have to do with the rest of the commit? And why are we doing > this? ....which is what you're asking about here. The eager freezing strategy triggers page-level freezing for any page that is about to become all-visible, so that it can be set all-frozen instead. But that's not entirely straightforward when there happens to be some LP_DEAD items on the heap page. There are really two ways that a page can become all-visible during VACUUM, and we want to account for that here. With eager freezing we want to make the pages become all-frozen instead of just all-visible, regardless of which heap pass (first pass or second pass) the page is set to become all-visible (and maybe even all-frozen). The comments that you mention were moved around a bit in passing. Note that we still set prunestate->all_visible to false inside lazy_scan_prune when we see remaining LP_DEAD stub items. We just do it later on, after we've decided on freezing stuff. (Obviously it wouldn't be okay to return to lazy_scan_heap without unsetting prunestate->all_visible if there are LP_DEAD items.) > Seems quite confusing to enter a block with described as "We're freezing the > page." when we're not freezing anything (tuples_frozen == 0). > I don't really get what freeze_all_eligible is trying to do. freeze_all_eligible (and the "tuples_frozen == 0" behavior) are both there because we can mark a page as all-frozen in the VM without freezing any of its tuples first. When that happens, we must make sure that "prunestate->all_frozen" is set to true, so that we'll actually set the all-frozen bit. At the same time, we need to be careful about the case where we *could* set the page all-frozen if we decided to freeze all eligible tuples -- we need to handle the case where we choose against freezing (and so can't set the all-frozen bit in the VM, and so must actually set "prunestate->all_frozen" to false). This is all kinda tricky because we're simultaneously dealing with the actual state of the page, and the anticipated state of the page in the near future. Closely related concepts, but distinct in important ways. > > #ifdef USE_ASSERT_CHECKING > > /* Note that all_frozen value does not matter when !all_visible */ > > - if (prunestate->all_visible) > > + if (prunestate->all_visible && lpdead_items == 0) > > { > > TransactionId cutoff; > > bool all_frozen; > > @@ -1849,8 +1876,7 @@ retry: > > if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen)) > > Assert(false); > > Not related to this change, but why isn't this just > Assert(heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))? It's just a matter of personal preference. I prefer to have a clear block of related code that contains multiple related assertions. You would probably have declared PG_USED_FOR_ASSERTS_ONLY variables at the top of lazy_scan_prune instead. FWIW if you did it the other way the assertion would actually have to include a "!prunestate->all_visible" test that short circuits the heap_page_is_all_visible() call from the Assert(). > > Subject: [PATCH v6 2/6] Teach VACUUM to use visibility map snapshot. > This should include a description of the memory usage effects. The visibilitymap.c side of this is the least worked out part of the patch series, by far. I have deliberately put off work on the data structure itself, preferring to focus on the vacuumlazy.c side of things for the time being. But I still agree -- fixed by acknowledging that that particular aspect of resource management is unresolved. I did have an open TODO before in the commit message, which is now improved based on your feedback: it now fully owns the fact that we really ignore the impact on memory usage right now. Just because that part is very WIP (much more so than every other part). > > This has significant advantages over the previous approach of using the > > authoritative VM fork to decide on which pages to skip. The number of > > heap pages processed will no longer increase when some other backend > > concurrently modifies a skippable page, since VACUUM will continue to > > see the page as skippable (which is correct because the page really is > > still skippable "relative to VACUUM's OldestXmin cutoff"). > > Why is it an advantage for the number of pages to not increase? The commit message goes into that immediately after the last line that you quoted. :-) Having an immutable structure will help us, both in the short term, for this particular project, and the long term. for other VACUUM enhancements. We need to have something that drives the cost model in vacuumlazy.c for the skipping strategy stuff -- we need to have advanced information about costs that drive the decision making process. Thanks to VM snapshots, the cost model is able to reason about the cost of relfrozenxid advancement precisely, in terms of "extra" scanned_pages implied by advancing relfrozenxid during this VACUUM. That level of precision is pretty nice IMV. It's not strictly necessary, but it's nice to be able to make a precise accurate comparison between each of the two skipping strategies. Did you happen to look at the 6th and final patch? It's trivial, but can have a big impact. It sizes dead_items while capping its sized based on scanned_pages, not based on rel_pages. That's obviously guaranteed to be correct. Note also that the 2nd patch teaches VACUUM VERBOSE to report the final number of scanned_pages right at the start, before scanning anything -- so it's a useful basis for much better progress reporting in pg_stat_progress_vacuum. Stuff like that also becomes very easy with VM snapshots. Then there is the more ambitious stuff, that's not in scope for this project. Example: Perhaps Sawada san will be able to take the concept of visibility map snapshots, and combine it with his Radix tree design -- which could presumably benefit from advanced knowledge of which pages can be scanned. This is information that is reliable, by definition. In fact I think that it would make a lot of sense for this visibility map snapshot data structure to be exactly the same structure used to store dead_items. They really are kind of the same thing. The design can reason precisely about which heap pages can ever end up having any LP_DEAD items. (It's already trivial to use the VM snapshot infrastructure as a precheck cache for dead_items lookups.) > > Non-aggressive VACUUMs now explicitly commit to (or decide against) > > early relfrozenxid advancement up-front. > > Why? We can advance relfrozenxid because it's cheap to, or because it's urgent (according to autovacuum_freeze_max_age). This is kind of true on HEAD already due to the autovacuum_freeze_max_age "escalate to aggressive" thing -- but we can do much better than that. Why not decide to advance relfrozenxid when (say) it's only *starting* to get urgent when it happens to be relatively cheap (though not dirt cheap)? We make relfrozenxid advancement a deliberate decision that weighs *all* available information, and has a sense of the needs of the table over time. The user experience is important here. Going back to a model where there is really just one kind of lazy VACUUM makes a lot of sense. We should have much more approximate guarantees about relfrozenxid advancement, since that's what gives us the flexibility to find a cheaper (or more stable) way of keeping up over time. It matters that we keep up over time, but it doesn't matter if we fall behind on relfrozenxid advancement -- at least not if we don't also fall behind on the work of freezing physical heap pages. > > VACUUM will now either scan > > every all-visible page, or none at all. This replaces lazy_scan_skip's > > SKIP_PAGES_THRESHOLD behavior, which was intended to enable early > > relfrozenxid advancement (see commit bf136cf6), but left many of the > > details to chance. > > The main goal according to bf136cf6 was to avoid defeating OS readahead, so I > think it should be mentioned here. Agreed. Fixed. > To me this is something that ought to be changed separately from the rest of > this commit. Maybe, but I'd say it depends on the final approach taken -- the visibilitymap.c aspects of the patch are the least settled. I am seriously considering adding prefetching to the vm snapshot structure, which would make it very much a direct replacement for SKIP_PAGES_THRESHOLD. Separately, I'm curious about what you think of VM snapshots from an aio point of view. Seems like it would be ideal for prefetching for aio? > > TODO: We don't spill VM snapshots to disk just yet (resource management > > aspects of VM snapshots still need work). For now a VM snapshot is just > > a copy of the VM pages stored in local buffers allocated by palloc(). > > HEAPBLOCKS_PER_PAGE is 32672 with the defaults. The maximum relation size is > 2**32 - 1 blocks. So the max FSM size is 131458 pages, a bit more than 1GB. Is > that correct? I think that you meant "max VM size". That sounds correct to me. > For large relations that are already nearly all-frozen this does add a > noticable amount of overhead, whether spilled to disk or not. Of course > they're also not going to be vacuumed super often, but ... I wouldn't be surprised if the patch didn't work with relations that approach 32 TiB in size. As I said, the visibilitymap.c data structure is the least worked out piece of the project. > Perhaps worth turning the VM into a range based description for the snapshot, > given it's a readonly datastructure in local memory? And we don't necessarily > need the all-frozen and all-visible in memory, one should suffice? We don't > even need random access, so it could easily be allocated incrementally, rather > than one large allocation. Definitely think that we should do simple run-length encoding, stuff like that. Just as long as it allows vacuumlazy.c to work off of a true snapshot, with scanned_pages known right from the start. The consumer side of things has been my focus so far. > Hm. It's a bit sad to compute the snapshot after determining OldestXmin. > > We probably should refresh OldestXmin periodically. That won't allow us to get > a more aggressive relfrozenxid, but it'd allow to remove more gunk. That may well be a good idea, but I think that it's also a good idea to just not scan heap pages that we know won't have XIDs < OldestXmin (OldestXmin at the start of the VACUUM). That visibly makes the problem of "recently dead" tuples that cannot be cleaned up a lot better, without requiring that we do anything with OldestXmin. I also think that there is something to be said for not updating the FSM for pages that were all-visible at the beginning of the VACUUM operation. VACUUM is currently quite happy to update the FSM with its own confused idea about how much free space there really is on heap pages with recently dead (dead but not yet removable) tuples. That's really bad, but really subtle. > What does it mean to "skip lazily"? Skipping even all-visible pages, prioritizing avoiding work over advancing relfrozenxid. This is a cost-based decision. As I mentioned a moment ago, that's one immediate use of VM snapshots (it gives us precise information to base our decision on, that simply *cannot* become invalid later on). > > + /* > > + * Visibility map page copied to local buffer for caller's snapshot. > > + * Caller requires an exact count of all-visible and all-frozen blocks > > + * in the heap relation. Handle that now. > > This part of the comment seems like it actually belongs further down? No, it just looks a bit like that because of the "truncate in-memory VM" code stanza. It's actually the right order. > > + * Must "truncate" our local copy of the VM to avoid incorrectly > > + * counting heap pages >= rel_pages as all-visible/all-frozen. Handle > > + * this by clearing irrelevant bits on the last VM page copied. > > + */ > > Hm - why would those bits already be set? No real reason, we "truncate" like this defensively. This will probably look quite different before too long. > > Subject: [PATCH v6 3/6] Add eager freezing strategy to VACUUM. > What's the logic behind a hard threshold? Suddenly freezing everything on a > huge relation seems problematic. I realize that never getting all that far > behind is part of the theory, but I don't think that's always going to work. It's a vast improvement on what we do currently, especially in append-only tables. There is simply no limit on how many physical heap pages will have to be frozen when there is an aggressive mode VACUUM. It could be terabytes, since table age predicts precisely nothing about costs. With the patch we have a useful limit for the first time, that uses physical units (the only kind of units that make any sense). Admittedly we should really have special instrumentation that reports when VACUUM must do "catch up freezing" when the vacuum_freeze_strategy_threshold threshold is first crossed, to help users to make better choices in this area. And maybe vacuum_freeze_strategy_threshold should be lower by default, so it's not as noticeable. (The GUC partly exists as a compatibility option, a bridge to the old lazy behavior.) Freezing just became approximately 5x cheaper with the freeze plan deduplication work (commit 9e540599). To say nothing about how vacuuming indexes became a lot cheaper in recent releases. So to some extent we can afford to be more proactive here. There are some very nonlinear cost profiles involved here due to write amplification effects. So having a strong bias against write amplification seems totally reasonable to me -- we can potentially "get it wrong" and still come out ahead, because we at least had the right idea about costs. I don't deny that there are clear downsides, though. I am convinced that it's worth it -- performance stability is what users actually complain about in almost all cases. Why should performance stability be 100% free? > Wouldn't a better strategy be to freeze a percentage of the relation on every > non-aggressive vacuum? That way the amount of work for an eventual aggressive > vacuum will shrink, without causing individual vacuums to take extremely long. I think that it's better to avoid aggressive mode altogether. By committing to advancing relfrozenxid by *some* amount in ~all VACUUMs against larger tables, we can notice when we don't actually need to do very much freezing to keep relfrozenxid current, due to workload characteristics. It depends on workload, of course. But if we don't try to do this we'll never notice that it's possible to do it. Why should we necessarily need to freeze very much, after a while? Why shouldn't most newly frozen pages stay frozen ~forever after a little while? > The other thing that I think would be to good to use is a) whether the page is > already in s_b, and b) whether the page already is dirty. The cost of freezing > shrinks significantly if it doesn't cause an additional read + write. And that > additional IO IMO one of the major concerns with freezing much more > aggressively in OLTPish workloads where a lot of the rows won't ever get old > enough to need freezing. Maybe, but I think that systematic effects are more important. We freeze eagerly during this VACUUM in part because it makes relfrozenxid advancement possible in the next VACUUM. Note that eager freezing doesn't freeze the page unless it's already going to set it all-visible. That's another way in which we ameliorate the problem of freezing when it makes little sense to -- even with eager freezing strategy, we *don't* freeze heap pages where it obviously makes little sense to. Which makes a huge difference on its own. There is good reason to believe that most individual heap pages are very cold data, even in OLTP apps. To a large degree Postgres is successful because it is good at inexpensively storing data that will possibly never be accessed: https://www.microsoft.com/en-us/research/video/cost-performance-in-modern-data-stores-how-data-cashing-systems-succeed/ Speaking of OLTP apps: in many cases VACUUM will prune just to remove one or two heap-only tuples, maybe even generating an FPI in the process. But the removed tuple wasn't actually doing any harm -- an opportunistic prune could have done the same thing later on, once we'd built up some more garbage tuples. So the only reason to prune is to freeze the page. And yet right now we don't recognize this and freeze the page to get *some* benefit out of the arguably needlessly prune. This is quite common, in fact. > The most significant aspect of anti-wrap autvacuums right now is that they > don't auto-cancel. Is that still used? If so, what's the threshold? This patch set doesn't change anything about antiwraparound autovacuums -- though it does completely eliminate aggressive mode (so it's a little like Postgres 8.4). There is a separate thread discussing the antiwraparound side of this, actually: https://postgr.es/m/CAH2-Wz=S-R_2rO49Hm94Nuvhu9_twRGbTm6uwDRmRu-Sqn_t3w@mail.gmail.com I think that I will need to invent a new type of autovacuum that's similar to antiwraparound autovacuum, but doesn't have the cancellation behavior -- that is more or less prerequisite to committing this patch series. We can accept some risk of relfrozenxid falling behind if that doesn't create any real risk of antiwraparound autovacuums. We can retain antiwraparound autovacuum, which should kick in only when the new kind of autovacuum has failed to advance relfrozenxid, having had the opportunity. Maybe antiwraparound autovacuum should be triggered when age(relfrozenxid) is twice the value of autovacuum_freeze_max_age. The new kind of autovacuum would trigger at the same table age that triggers antiwraparound autovacuum with the current design. So antiwraparound autovacuum would work in the same way, but would be much less common -- even for totally static tables. We'd at least be sure that the auto cancellation behavior was *proportionate* to the problem at hand, because we'll always have tried and ultimately failed to advance relfrozenxid without activating the auto cancellation behavior. We wouldn't trigger a very disruptive behavior routinely, without any very good reason. > > Now every VACUUM might need to wait for a cleanup lock, though few will. > > It can only happen when required to advance relfrozenxid to no less than > > half way between the existing relfrozenxid and nextXID. > > Where's that "halfway" bit coming from? We don't use FreezeLimit within lazy_scan_noprune in the patch that gets rid of aggressive mode VACUUM. We use something called minXid in its place. So a different timeline to freezing (even for tables where we always use the lazy freezing strategy). The new minXid cutoff (used by lazy_scan_noprune) comes from this point in vacuum_set_xid_limits(): + *minXid = nextXID - (freeze_table_age / 2); + if (!TransactionIdIsNormal(*minXid)) + *minXid = FirstNormalTransactionId; So that's what I meant by "half way". (Note that minXid is guaranteed to be <= FreezeLimit, which is itself guaranteed to be <= OldestXmin, no matter what.) > Isn't "half way between the relfrozenxid and nextXID" a problem for instances > with longrunning transactions? Should we do less relfrozenxid advancement because there is a long running transaction, though? It's obviously seriously bad when things are blocked by a long running transaction, but I don't see the connection between that and how we wait for cleanup locks. Waiting for cleanup locks is always really, really bad, and can be avoided in almost all cases. I suspect that I still haven't been aggressive enough in how minXid is set, BTW -- we should be avoiding waiting for a cleanup lock like the plague. So "half way" isn't enough. Maybe we should have a LOG message in cases where it actually proves necessary to wait, because it's just asking for trouble (at least when we're running in autovacuum). > Wouldn't this mean that wait for every page if > relfrozenxid can't be advanced much because of a longrunning query or such? Old XIDs always start out as young XIDs. Which we're now quite willing to freeze when conditions look good. Page level freezing always freezes all eligible XIDs on the page when triggered, no matter what the details may be. This means that the oldest XID on a heap page is more or less always an XID that's after whatever OldestXmin was for the last VACUUM that ran and froze the page, whenever that happened, and regardless of the mix of XID ages was on the page at that time. As a consequence, lone XIDs that are far older than other XIDs on the same page become much rarer than what you'd see with the current design -- they have to "survive" multiple VACUUMs, not just one VACUUM. The best predictor of XID age becomes the time that VACUUM last froze the page as a whole -- so workload characteristics and natural variations are much much less likely to lead to problems from waiting for cleanup locks. (Of course it also helps that we'll try really hard to do that, and almost always prefer lazy_scan_noprune processing.) There is some sense in which we're trying to create a virtuous cycle here. If we are always in a position to advance relfrozenxid by *some* amount each VACUUM, however small, then we will have many individual opportunities (spaced out over multiple VACUUM operations) to freeze tuples on any heap tuples that (for whatever reason) are harder to get a cleanup lock on, and then catch up on relfrozenxid by a huge amount whenever we "get lucky". We have to "keep an open mind" to ever have any chance of "getting lucky" in this sense, though. > > VACUUM is the only > > mechanism that can claw back MultixactId space, so allowing VACUUM to > > consume MultiXactId space (for any reason) adds to the risk that the > > system will trigger the multiStopLimit wraparound protection mechanism. > > Strictly speaking that's not quite true, you can also drop/truncate tables ;) Fixed. -- Peter Geoghegan
Attachment
- v7-0005-Avoid-allocating-MultiXacts-during-VACUUM.patch
- v7-0002-Teach-VACUUM-to-use-visibility-map-snapshot.patch
- v7-0006-Size-VACUUM-s-dead_items-space-using-VM-snapshot.patch
- v7-0004-Make-VACUUM-s-aggressive-behaviors-continuous.patch
- v7-0003-Add-eager-freezing-strategy-to-VACUUM.patch
- v7-0001-Add-page-level-freezing-to-VACUUM.patch
pgsql-hackers by date: