From ec184e3b5e44c93b0938e8f8b27d73642c8fb479 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sun, 12 Jun 2022 15:46:08 -0700 Subject: [PATCH v11 1/4] Add page-level freezing to VACUUM. Teach VACUUM to decide on whether or not to trigger freezing at the level of whole heap pages, not individual tuple fields. OldestXmin is now treated as the cutoff for freezing eligibility in all cases, while FreezeLimit is used to trigger freezing at the level of each page (we now freeze all eligible XIDs on a page when freezing is triggered for the page). Making the choice to freeze work at the page level tends to result in VACUUM writing less WAL in the long term. This is especially likely to work due to complementary effects with the freeze plan WAL deduplication optimization added by commit 9e540599. Also teach VACUUM to trigger page-level freezing whenever it detects that heap pruning generated an FPI as torn page protection. We'll have already written a large amount of WAL just to do that much, so it's very likely a good idea to get freezing out of the way for the page early. This only happens in cases where it will directly lead to marking the page all-frozen in the visibility map. In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives FreezeMultiXactId the leeway to put off the work of processing an individual tuple's xmax whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new Multi is especially worth avoiding here). FreezeMultiXactId effectively makes a decision on how to proceed with processing at the level of each individual xmax field. Its no-op multi processing "freezes" an xmax in the event of an expensive-to-process xmax on a page when (for whatever reason) page-level freezing triggers. If, on the other hand, freezing is not triggered for the page, then page-level no-op processing takes care of the multi for us instead. Either way, the remaining Multi will ratchet back VACUUM's relfrozenxid and/or relminmxid trackers as required, and we won't need an expensive second pass over the multi (unless we really have no choice, for example during a VACUUM FREEZE, where FreezeLimit always matches OldestXmin). This allows vacuumlazy.c to think of freezing as something that happens at the page level, or not at all -- without concerning itself with any of these details. It largely cedes control of decisions about freezing and relfrozenxid/relminmxid to the heapam.c freezing routines (routines like heap_prepare_freeze_tuple and FreezeMultiXactId), which now have all of the context needed to make decisions about freezing and how it may affect relfrozenxid and relminmxid advancement. vacuumlazy.c is now free to focus on the big picture around freezing physical heap pages. Later work will add eager freezing strategy to VACUUM (and recast the behavior established in this commit as lazy freezing, though it isn't quite as lazy as the historic tuple-orientated approach to freezing). Making freezing work at the page level is not just an optimization; it's also a useful basis for modelling costs at the whole table level, since it makes the visibility map a more reliable indicator of just how far behind we are on freezing at the level of the whole table. Later work that adds explicit eager and lazy scanning strategies will build on this in order to teach VACUUM to advance relfrozenxid earlier and much more frequently than before. Author: Peter Geoghegan Reviewed-By: Jeff Davis Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com --- src/include/access/heapam.h | 82 +++++- src/backend/access/heap/heapam.c | 388 +++++++++++++++------------ src/backend/access/heap/pruneheap.c | 16 +- src/backend/access/heap/vacuumlazy.c | 132 ++++++--- doc/src/sgml/config.sgml | 11 +- 5 files changed, 397 insertions(+), 232 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 53eb01176..0782fed14 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -113,6 +113,71 @@ typedef struct HeapTupleFreeze OffsetNumber offset; } HeapTupleFreeze; +/* + * State used by VACUUM to track the details of freezing all eligible tuples + * on a given heap page. + * + * VACUUM prepares freeze plans for each page via heap_prepare_freeze_tuple + * calls (every tuple with storage gets its own call). This page-level freeze + * state is updated across each call, which ultimately determines whether or + * not freezing the page is required. (VACUUM freezes the page via a call to + * heap_freeze_execute_prepared, which freezes using prepared freeze plans.) + * + * Aside from the basic question of whether or not freezing will go ahead, the + * state also tracks the oldest extant XID/MXID in the table as a whole, for + * the purposes of advancing relfrozenxid/relminmxid values in pg_class later + * on. Each heap_prepare_freeze_tuple call pushes NewRelfrozenXid and/or + * NewRelminMxid back as required to avoid unsafe final pg_class values. Any + * and all unfrozen XIDs or MXIDs that remain after VACUUM finishes _must_ + * have values >= the final relfrozenxid/relminmxid values in pg_class. This + * includes XIDs that remain as MultiXact members from any tuple's xmax. + * + * When 'freeze_required' flag isn't set after all tuples are examined, the + * final choice on freezing is made by vacuumlazy.c. It can decide to trigger + * freezing based on whatever criteria it deems appropriate. However, it is + * highly recommended that vacuumlazy.c avoid freezing any page that cannot be + * marked all-frozen in the visibility map afterwards. + * + * Freezing is typically optional for most individual pages scanned during any + * given VACUUM operation. This allows vacuumlazy.c to manage the cost of + * freezing at the level of the entire VACUUM operation/entire heap relation. + */ +typedef struct HeapPageFreeze +{ + /* Is heap_prepare_freeze_tuple caller required to freeze page? */ + bool freeze_required; + + /* + * "No freeze" NewRelfrozenXid/NewRelminMxid trackers. + * + * These trackers are maintained in the same way as the trackers used when + * VACUUM scans a page that isn't cleanup locked. Both code paths are + * based on the same general idea (do less work for this page during the + * ongoing VACUUM, at the cost of having to accept older final values). + */ + TransactionId NoFreezePageRelfrozenXid; + MultiXactId NoFreezePageRelminMxid; + + /* + * Trackers used when heap_freeze_execute_prepared freezes the page. + * + * When we freeze a page, we generally freeze all XIDs < OldestXmin, only + * leaving behind XIDs that are ineligible for freezing, if any. And so + * you might wonder why these trackers are necessary at all; why should + * _any_ page that VACUUM freezes _ever_ be left with XIDs/MXIDs that + * ratchet back the rel-level NewRelfrozenXid/NewRelminMxid trackers? + * + * It is useful to use a definition of "freeze the page" that does not + * overspecify how MultiXacts are affected. heap_prepare_freeze_tuple + * generally prefers to remove Multis eagerly, but lazy processing is used + * in cases where laziness allows VACUUM to avoid allocating a new Multi. + * The "freeze the page" trackers enable this flexibility. + */ + TransactionId FreezePageRelfrozenXid; + MultiXactId FreezePageRelminMxid; + +} HeapPageFreeze; + /* ---------------- * function prototypes for heap access method * @@ -180,19 +245,18 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple, extern void heap_inplace_update(Relation relation, HeapTuple tuple); extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, - HeapTupleFreeze *frz, bool *totally_frozen, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); + HeapPageFreeze *pagefrz, + HeapTupleFreeze *frz, bool *totally_frozen); extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer, - TransactionId FreezeLimit, + TransactionId snapshotConflictHorizon, HeapTupleFreeze *tuples, int ntuples); extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId FreezeLimit, TransactionId MultiXactCutoff); -extern bool heap_tuple_would_freeze(HeapTupleHeader tuple, - const struct VacuumCutoffs *cutoffs, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); +extern bool heap_tuple_should_freeze(HeapTupleHeader tuple, + const struct VacuumCutoffs *cutoffs, + TransactionId *NoFreezePageRelfrozenXid, + MultiXactId *NoFreezePageRelminMxid); extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple); extern void simple_heap_insert(Relation relation, HeapTuple tup); @@ -210,7 +274,7 @@ extern int heap_page_prune(Relation relation, Buffer buffer, struct GlobalVisState *vistest, TransactionId old_snap_xmin, TimestampTz old_snap_ts, - int *nnewlpdead, + int *nnewlpdead, bool *prune_fpi, OffsetNumber *off_loc); extern void heap_page_prune_execute(Buffer buffer, OffsetNumber *redirected, int nredirected, diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 86a88de85..dae3f26ce 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6098,9 +6098,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * MultiXactId. * * "flags" is an output value; it's used to tell caller what to do on return. - * - * "mxid_oldest_xid_out" is an output value; it's used to track the oldest - * extant Xid within any Multixact that will remain after freezing executes. + * "pagefrz" is an input/output value, used to manage page level freezing. * * Possible values that we can set in "flags": * FRM_NOOP @@ -6115,16 +6113,34 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * The return value is a new MultiXactId to set as new Xmax. * (caller must obtain proper infomask bits using GetMultiXactIdHintBits) * - * "mxid_oldest_xid_out" is only set when "flags" contains either FRM_NOOP or - * FRM_RETURN_IS_MULTI, since we only leave behind a MultiXactId for these. + * Caller delegates control of page freezing to us. In practice we always + * force freezing of caller's page unless FRM_NOOP processing is indicated. + * We help caller ensure that XIDs < FreezeLimit and MXIDs < MultiXactCutoff + * can never be left behind. We freely choose when and how to process each + * Multi, without ever violating the cutoff postconditions for freezing. * - * NB: Creates a _new_ MultiXactId when FRM_RETURN_IS_MULTI is set in "flags". + * It's useful to remove Multis on a proactive timeline (relative to freezing + * XIDs) to keep MultiXact member SLRU buffer misses to a minimum. It can also + * be cheaper in the short run, for us, since we too can avoid SLRU buffer + * misses through eager processing. + * + * NB: Creates a _new_ MultiXactId when FRM_RETURN_IS_MULTI is set, though only + * when FreezeLimit and/or MultiXactCutoff cutoffs leave us with no choice. + * This can usually be put off, which is usually enough to avoid it altogether. + * + * NB: Caller must maintain "no freeze" NewRelfrozenXid/NewRelminMxid trackers + * using heap_tuple_should_freeze when we haven't forced page-level freezing. + * + * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we + * have already forced page-level freezing, since that might incur the same + * SLRU buffer misses that we specifically intended to avoid by freezing. */ static TransactionId -FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, +FreezeMultiXactId(MultiXactId multi, HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, uint16 *flags, - TransactionId *mxid_oldest_xid_out) + HeapPageFreeze *pagefrz) { + uint16 t_infomask = tuple->t_infomask; TransactionId newxmax = InvalidTransactionId; MultiXactMember *members; int nmembers; @@ -6134,7 +6150,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool has_lockers; TransactionId update_xid; bool update_committed; - TransactionId temp_xid_out; + TransactionId FreezePageRelfrozenXid = pagefrz->FreezePageRelfrozenXid; + TransactionId axid PG_USED_FOR_ASSERTS_ONLY = cutoffs->OldestXmin; + MultiXactId amxid PG_USED_FOR_ASSERTS_ONLY = cutoffs->OldestMxact; *flags = 0; @@ -6146,14 +6164,16 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, { /* Ensure infomask bits are appropriately set/reset */ *flags |= FRM_INVALIDATE_XMAX; - return InvalidTransactionId; + pagefrz->freeze_required = true; + Assert(!TransactionIdIsValid(newxmax)); + return newxmax; } else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("found multixact %u from before relminmxid %u", multi, cutoffs->relminmxid))); - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact)) { /* * This old multi cannot possibly have members still running, but @@ -6166,7 +6186,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("multixact %u from before cutoff %u found to be still running", - multi, cutoffs->MultiXactCutoff))); + multi, cutoffs->OldestMxact))); if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) { @@ -6202,14 +6222,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, } else { + if (TransactionIdPrecedes(newxmax, FreezePageRelfrozenXid)) + FreezePageRelfrozenXid = newxmax; *flags |= FRM_RETURN_IS_XID; } } - /* - * Don't push back mxid_oldest_xid_out using FRM_RETURN_IS_XID Xid, or - * when no Xids will remain - */ + pagefrz->FreezePageRelfrozenXid = FreezePageRelfrozenXid; + pagefrz->freeze_required = true; return newxmax; } @@ -6225,11 +6245,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, { /* Nothing worth keeping */ *flags |= FRM_INVALIDATE_XMAX; - return InvalidTransactionId; + pagefrz->freeze_required = true; + Assert(!TransactionIdIsValid(newxmax)); + return newxmax; } need_replace = false; - temp_xid_out = *mxid_oldest_xid_out; /* init for FRM_NOOP */ + FreezePageRelfrozenXid = pagefrz->FreezePageRelfrozenXid; /* for FRM_NOOP */ for (int i = 0; i < nmembers; i++) { TransactionId xid = members[i].xid; @@ -6238,26 +6260,35 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) { + /* Can't violate the FreezeLimit postcondition */ need_replace = true; break; } - if (TransactionIdPrecedes(members[i].xid, temp_xid_out)) - temp_xid_out = members[i].xid; + if (TransactionIdPrecedes(xid, FreezePageRelfrozenXid)) + FreezePageRelfrozenXid = xid; } - /* - * In the simplest case, there is no member older than FreezeLimit; we can - * keep the existing MultiXactId as-is, avoiding a more expensive second - * pass over the multi - */ + /* Can't violate the MultiXactCutoff postcondition, either */ + if (!need_replace) + need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff); + if (!need_replace) { /* - * When mxid_oldest_xid_out gets pushed back here it's likely that the - * update Xid was the oldest member, but we don't rely on that + * FRM_NOOP case is the only one where we don't force page-level + * freezing (see header comments) */ *flags |= FRM_NOOP; - *mxid_oldest_xid_out = temp_xid_out; + + /* + * Might have to ratchet back NewRelminMxid, NewRelfrozenXid, or both + * together to make it safe to skip this particular multi/tuple xmax + * if the page is frozen (similar handling will also be required if + * the page isn't frozen, but caller deals with that directly). + */ + pagefrz->FreezePageRelfrozenXid = FreezePageRelfrozenXid; + if (MultiXactIdPrecedes(multi, pagefrz->FreezePageRelminMxid)) + pagefrz->FreezePageRelminMxid = multi; pfree(members); return multi; } @@ -6266,13 +6297,18 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * Do a more thorough second pass over the multi to figure out which * member XIDs actually need to be kept. Checking the precise status of * individual members might even show that we don't need to keep anything. + * + * We only reach this far when replacing xmax is absolutely mandatory. + * heap_tuple_should_freeze will indicate that the tuple should be frozen. */ + Assert(heap_tuple_should_freeze(tuple, cutoffs, &axid, &amxid)); + nnewmembers = 0; newmembers = palloc(sizeof(MultiXactMember) * nmembers); has_lockers = false; update_xid = InvalidTransactionId; update_committed = false; - temp_xid_out = *mxid_oldest_xid_out; /* init for FRM_RETURN_IS_MULTI */ + FreezePageRelfrozenXid = pagefrz->FreezePageRelfrozenXid; /* re-init */ /* * Determine whether to keep each member xid, or to ignore it instead @@ -6360,11 +6396,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, /* * We determined that this is an Xid corresponding to an update that * must be retained -- add it to new members list for later. Also - * consider pushing back mxid_oldest_xid_out. + * consider pushing back NewRelfrozenXid tracker. */ newmembers[nnewmembers++] = members[i]; - if (TransactionIdPrecedes(xid, temp_xid_out)) - temp_xid_out = xid; + if (TransactionIdPrecedes(xid, FreezePageRelfrozenXid)) + FreezePageRelfrozenXid = xid; } pfree(members); @@ -6375,10 +6411,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ if (nnewmembers == 0) { - /* nothing worth keeping!? Tell caller to remove the whole thing */ + /* + * Keeping nothing (neither an Xid nor a MultiXactId) in xmax. Won't + * have to ratchet back NewRelfrozenXid or NewRelminMxid. + */ *flags |= FRM_INVALIDATE_XMAX; newxmax = InvalidTransactionId; - /* Don't push back mxid_oldest_xid_out -- no Xids will remain */ + + Assert(pagefrz->FreezePageRelfrozenXid == FreezePageRelfrozenXid); } else if (TransactionIdIsValid(update_xid) && !has_lockers) { @@ -6394,22 +6434,29 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (update_committed) *flags |= FRM_MARK_COMMITTED; newxmax = update_xid; - /* Don't push back mxid_oldest_xid_out using FRM_RETURN_IS_XID Xid */ + + /* Might have to push back FreezePageRelfrozenXid/NewRelfrozenXid */ + Assert(TransactionIdPrecedesOrEquals(FreezePageRelfrozenXid, + update_xid)); } else { /* * Create a new multixact with the surviving members of the previous * one, to set as new Xmax in the tuple. The oldest surviving member - * might push back mxid_oldest_xid_out. + * might have already pushed back NewRelfrozenXid. */ newxmax = MultiXactIdCreateFromMembers(nnewmembers, newmembers); *flags |= FRM_RETURN_IS_MULTI; - *mxid_oldest_xid_out = temp_xid_out; + + /* Never need to push back FreezePageRelminMxid/NewRelminMxid */ + Assert(MultiXactIdPrecedesOrEquals(cutoffs->OldestMxact, newxmax)); } pfree(newmembers); + pagefrz->FreezePageRelfrozenXid = FreezePageRelfrozenXid; + pagefrz->freeze_required = true; return newxmax; } @@ -6417,9 +6464,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * heap_prepare_freeze_tuple * * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) - * are older than the FreezeLimit and/or MultiXactCutoff freeze cutoffs. If so, - * setup enough state (in the *frz output argument) to later execute and - * WAL-log what caller needs to do for the tuple, and return true. Return + * are older than the OldestXmin and/or OldestMxact freeze cutoffs. If so, + * setup enough state (in the *frz output argument) to enable caller to + * process this tuple as part of freezing its page, and return true. Return * false if nothing can be changed about the tuple right now. * * Also sets *totally_frozen to true if the tuple will be totally frozen once @@ -6427,22 +6474,30 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * frozen by an earlier VACUUM). This indicates that there are no remaining * XIDs or MultiXactIds that will need to be processed by a future VACUUM. * - * VACUUM caller must assemble HeapTupleFreeze entries for every tuple that we - * returned true for when called. A later heap_freeze_execute_prepared call - * will execute freezing for caller's page as a whole. + * VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every + * tuple that we returned true for, and call heap_freeze_execute_prepared to + * execute freezing. Caller must initialize pagefrz fields for page as a + * whole before first call here for each heap page. + * + * VACUUM caller decides on whether or not to freeze the page as a whole. + * We'll often prepare freeze plans for a page that caller just discards. + * However, VACUUM doesn't always get to make a choice; it must freeze when + * pagefrz.freeze_required is set, to ensure that any XIDs < FreezeLimit (and + * MXIDs < MultiXactCutoff) can never be left behind. We make sure that + * VACUUM always follows that rule. + * + * We sometimes force freezing of xmax MultiXactId values long before it is + * strictly necessary to do so just to ensure the FreezeLimit postcondition. + * It's worth processing MultiXactIds proactively when it is cheap to do so, + * and it's convenient to make that happen by piggy-backing it on the "force + * freezing" mechanism. Conversely, we sometimes delay freezing MultiXactIds + * because it is expensive right now (though only when it's still possible to + * do so without violating the FreezeLimit/MultiXactCutoff postcondition). * * It is assumed that the caller has checked the tuple with * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD * (else we should be removing the tuple, not freezing it). * - * The *relfrozenxid_out and *relminmxid_out arguments are the current target - * relfrozenxid and relminmxid for VACUUM caller's heap rel. Any and all - * unfrozen XIDs or MXIDs that remain in caller's rel after VACUUM finishes - * _must_ have values >= the final relfrozenxid/relminmxid values in pg_class. - * This includes XIDs that remain as MultiXact members from any tuple's xmax. - * Each call here pushes back *relfrozenxid_out and/or *relminmxid_out as - * needed to avoid unsafe final values in rel's authoritative pg_class tuple. - * * NB: This function has side effects: it might allocate a new MultiXactId. * It will be set as tuple's new xmax when our *frz output is processed within * heap_execute_freeze_tuple later on. If the tuple is in a shared buffer @@ -6451,9 +6506,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, - HeapTupleFreeze *frz, bool *totally_frozen, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) + HeapPageFreeze *pagefrz, + HeapTupleFreeze *frz, bool *totally_frozen) { bool xmin_already_frozen = false, xmax_already_frozen = false; @@ -6470,7 +6524,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * Process xmin, while keeping track of whether it's already frozen, or - * will become frozen when our freeze plan is executed by caller (could be + * will become frozen iff our freeze plan is executed by caller (could be * neither). */ xid = HeapTupleHeaderGetXmin(tuple); @@ -6484,21 +6538,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, errmsg_internal("found xmin %u from before relfrozenxid %u", xid, cutoffs->relfrozenxid))); - freeze_xmin = TransactionIdPrecedes(xid, cutoffs->FreezeLimit); - if (freeze_xmin) - { - if (!TransactionIdDidCommit(xid)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen", - xid, cutoffs->FreezeLimit))); - } - else - { - /* xmin to remain unfrozen. Could push back relfrozenxid_out. */ - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - } + freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin); + if (freeze_xmin && !TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen", + xid, cutoffs->OldestXmin))); } /* @@ -6515,41 +6560,55 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * For Xvac, we always freeze proactively. This allows totally_frozen * tracking to ignore xvac. */ - replace_xvac = true; + replace_xvac = pagefrz->freeze_required = true; } - /* - * Process xmax. To thoroughly examine the current Xmax value we need to - * resolve a MultiXactId to its member Xids, in case some of them are - * below the given FreezeLimit. In that case, those values might need - * freezing, too. Also, if a multi needs freezing, we cannot simply take - * it out --- if there's a live updater Xid, it needs to be kept. - * - * Make sure to keep heap_tuple_would_freeze in sync with this. - */ + /* Now process xmax */ xid = HeapTupleHeaderGetRawXmax(tuple); - if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { /* Raw xmax is a MultiXactId */ TransactionId newxmax; uint16 flags; - TransactionId mxid_oldest_xid_out = *relfrozenxid_out; - newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs, - &flags, &mxid_oldest_xid_out); + /* + * We will either remove xmax completely (in the "freeze_xmax" path), + * process xmax by replacing it (in the "replace_xmax" path), or + * perform no-op xmax processing. The only constraint is that the + * FreezeLimit/MultiXactCutoff postcondition must never be violated. + */ + newxmax = FreezeMultiXactId(xid, tuple, cutoffs, &flags, pagefrz); - if (flags & FRM_RETURN_IS_XID) + if (flags & FRM_NOOP) + { + /* + * xmax is a MultiXactId, and nothing about it changes for now. + * This is the only case where 'freeze_required' won't have been + * set for us by FreezeMultiXactId, as well as the only case where + * neither freeze_xmax nor replace_xmax are set (given a multi). + * + * This is a no-op, but the call to FreezeMultiXactId might have + * ratcheted back NewRelfrozenXid and/or NewRelminMxid for us. + * That makes it safe to freeze the page while leaving this + * particular xmax undisturbed. + * + * FreezeMultiXactId is _not_ responsible for the "no freeze" + * NewRelfrozenXid/NewRelminMxid trackers, though -- that's our + * job. A call to heap_tuple_should_freeze for this same tuple + * will take place below if 'freeze_required' isn't set already. + * (This approach repeats some of the work from FreezeMultiXactId, + * which is not ideal but makes things simpler.) + */ + Assert(MultiXactIdIsValid(newxmax) && xid == newxmax); + Assert(!MultiXactIdPrecedes(newxmax, pagefrz->FreezePageRelminMxid)); + } + else if (flags & FRM_RETURN_IS_XID) { /* * xmax will become an updater Xid (original MultiXact's updater * member Xid will be carried forward as a simple Xid in Xmax). - * Might have to ratchet back relfrozenxid_out here, though never - * relminmxid_out. */ Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin)); - if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) - *relfrozenxid_out = newxmax; /* * NB -- some of these transformations are only valid because we @@ -6572,13 +6631,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * xmax is an old MultiXactId that we have to replace with a new * MultiXactId, to carry forward two or more original member XIDs. - * Might have to ratchet back relfrozenxid_out here, though never - * relminmxid_out. */ Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact)); - Assert(TransactionIdPrecedesOrEquals(mxid_oldest_xid_out, - *relfrozenxid_out)); - *relfrozenxid_out = mxid_oldest_xid_out; /* * We can't use GetMultiXactIdHintBits directly on the new multi @@ -6594,20 +6648,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->xmax = newxmax; replace_xmax = true; } - else if (flags & FRM_NOOP) - { - /* - * xmax is a MultiXactId, and nothing about it changes for now. - * Might have to ratchet back relminmxid_out, relfrozenxid_out, or - * both together. - */ - Assert(MultiXactIdIsValid(newxmax) && xid == newxmax); - Assert(TransactionIdPrecedesOrEquals(mxid_oldest_xid_out, - *relfrozenxid_out)); - if (MultiXactIdPrecedes(xid, *relminmxid_out)) - *relminmxid_out = xid; - *relfrozenxid_out = mxid_oldest_xid_out; - } else { /* @@ -6621,6 +6661,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* Will set t_infomask/t_infomask2 flags in freeze plan below */ freeze_xmax = true; } + + /* Only FRM_NOOP doesn't force caller to freeze page */ + Assert(pagefrz->freeze_required || (!freeze_xmax && !replace_xmax)); } else if (TransactionIdIsNormal(xid)) { @@ -6631,28 +6674,21 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, errmsg_internal("found xmax %u from before relfrozenxid %u", xid, cutoffs->relfrozenxid))); - if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) - { - /* - * If we freeze xmax, make absolutely sure that it's not an XID - * that is important. (Note, a lock-only xmax can be removed - * independent of committedness, since a committed lock holder has - * released the lock). - */ - if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && - TransactionIdDidCommit(xid)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("cannot freeze committed xmax %u", - xid))); + if (TransactionIdPrecedes(xid, cutoffs->OldestXmin)) freeze_xmax = true; - /* No need for relfrozenxid_out handling, since we'll freeze xmax */ - } - else - { - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - } + + /* + * If we freeze xmax, make absolutely sure that it's not an XID that + * is important. (Note, a lock-only xmax can be removed independent + * of committedness, since a committed lock holder has released the + * lock). + */ + if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && + TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("cannot freeze committed xmax %u", + xid))); } else if (!TransactionIdIsValid(xid)) { @@ -6679,6 +6715,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * failed; whereas a non-dead MOVED_IN tuple must mean the xvac * transaction succeeded. */ + Assert(pagefrz->freeze_required); if (tuple->t_infomask & HEAP_MOVED_OFF) frz->frzflags |= XLH_INVALID_XVAC; else @@ -6687,6 +6724,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, if (replace_xmax) { Assert(!xmax_already_frozen && !freeze_xmax); + Assert(pagefrz->freeze_required); /* Already set t_infomask/t_infomask2 flags in freeze plan */ } @@ -6709,7 +6747,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * Determine if this tuple is already totally frozen, or will become - * totally frozen + * totally frozen (provided caller executes freeze plan for the page) */ *totally_frozen = ((freeze_xmin || xmin_already_frozen) && (freeze_xmax || xmax_already_frozen)); @@ -6717,6 +6755,19 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* A "totally_frozen" tuple must not leave anything behind in xmax */ Assert(!*totally_frozen || !replace_xmax); + /* + * Check if the option of _not_ freezing caller's page is still in play, + * though don't bother when we already forced freezing earlier on + */ + if (!pagefrz->freeze_required && !(xmin_already_frozen && + xmax_already_frozen)) + { + pagefrz->freeze_required = + heap_tuple_should_freeze(tuple, cutoffs, + &pagefrz->NoFreezePageRelfrozenXid, + &pagefrz->NoFreezePageRelminMxid); + } + /* Tell caller if this tuple has a usable freeze plan set in *frz */ return freeze_xmin || replace_xvac || replace_xmax || freeze_xmax; } @@ -6761,13 +6812,12 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz) */ void heap_freeze_execute_prepared(Relation rel, Buffer buffer, - TransactionId FreezeLimit, + TransactionId snapshotConflictHorizon, HeapTupleFreeze *tuples, int ntuples) { Page page = BufferGetPage(buffer); Assert(ntuples > 0); - Assert(TransactionIdIsNormal(FreezeLimit)); START_CRIT_SECTION(); @@ -6790,19 +6840,10 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer, int nplans; xl_heap_freeze_page xlrec; XLogRecPtr recptr; - TransactionId snapshotConflictHorizon; /* Prepare deduplicated representation for use in WAL record */ nplans = heap_xlog_freeze_plan(tuples, ntuples, plans, offsets); - /* - * FreezeLimit is (approximately) the first XID not frozen by VACUUM. - * Back up caller's FreezeLimit to avoid false conflicts when - * FreezeLimit is precisely equal to VACUUM's OldestXmin cutoff. - */ - snapshotConflictHorizon = FreezeLimit; - TransactionIdRetreat(snapshotConflictHorizon); - xlrec.snapshotConflictHorizon = snapshotConflictHorizon; xlrec.nplans = nplans; @@ -6843,8 +6884,7 @@ heap_freeze_tuple(HeapTupleHeader tuple, bool do_freeze; bool totally_frozen; struct VacuumCutoffs cutoffs; - TransactionId NewRelfrozenXid = FreezeLimit; - MultiXactId NewRelminMxid = MultiXactCutoff; + HeapPageFreeze pagefrz; cutoffs.relfrozenxid = relfrozenxid; cutoffs.relminmxid = relminmxid; @@ -6853,9 +6893,14 @@ heap_freeze_tuple(HeapTupleHeader tuple, cutoffs.FreezeLimit = FreezeLimit; cutoffs.MultiXactCutoff = MultiXactCutoff; - do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs, - &frz, &totally_frozen, - &NewRelfrozenXid, &NewRelminMxid); + pagefrz.freeze_required = true; + pagefrz.NoFreezePageRelfrozenXid = FreezeLimit; + pagefrz.NoFreezePageRelminMxid = MultiXactCutoff; + pagefrz.FreezePageRelfrozenXid = FreezeLimit; + pagefrz.FreezePageRelminMxid = MultiXactCutoff; + + do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs, &pagefrz, + &frz, &totally_frozen); /* * Note that because this is not a WAL-logged operation, we don't need to @@ -7278,22 +7323,23 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) } /* - * heap_tuple_would_freeze + * heap_tuple_should_freeze * - * Return value indicates if heap_prepare_freeze_tuple sibling function would - * freeze any of the XID/MXID fields from the tuple, given the same cutoffs. - * We must also deal with dead tuples here, since (xmin, xmax, xvac) fields - * could be processed by pruning away the whole tuple instead of freezing. + * Return value indicates if heap_prepare_freeze_tuple sibling function should + * force freezing of the page containing tuple. This happens whenever the + * tuple contains XID/MXID fields with values < FreezeLimit/MultiXactCutoff. * - * The *relfrozenxid_out and *relminmxid_out input/output arguments work just - * like the heap_prepare_freeze_tuple arguments that they're based on. We - * never freeze here, which makes tracking the oldest extant XID/MXID simple. + * The *NoFreezePageRelfrozenXid and *NoFreezePageRelminMxid input/output + * arguments help VACUUM track the oldest extant XID/MXID remaining in rel. + * Our working assumption is that caller won't decide to freeze this tuple. + * It's up to caller to only ratchet back its own top-level trackers after the + * point that it commits to not freezing the tuple/page in question. */ bool -heap_tuple_would_freeze(HeapTupleHeader tuple, - const struct VacuumCutoffs *cutoffs, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) +heap_tuple_should_freeze(HeapTupleHeader tuple, + const struct VacuumCutoffs *cutoffs, + TransactionId *NoFreezePageRelfrozenXid, + MultiXactId *NoFreezePageRelminMxid) { TransactionId xid; MultiXactId multi; @@ -7304,8 +7350,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, if (TransactionIdIsNormal(xid)) { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7322,8 +7368,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); /* xmax is a non-permanent XID */ - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7334,8 +7380,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) { /* xmax is a pg_upgrade'd MultiXact, which can't have updater XID */ - if (MultiXactIdPrecedes(multi, *relminmxid_out)) - *relminmxid_out = multi; + if (MultiXactIdPrecedes(multi, *NoFreezePageRelminMxid)) + *NoFreezePageRelminMxid = multi; /* heap_prepare_freeze_tuple always freezes pg_upgrade'd xmax */ freeze = true; } @@ -7346,8 +7392,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, int nmembers; Assert(MultiXactIdPrecedesOrEquals(cutoffs->relminmxid, multi)); - if (MultiXactIdPrecedes(multi, *relminmxid_out)) - *relminmxid_out = multi; + if (MultiXactIdPrecedes(multi, *NoFreezePageRelminMxid)) + *NoFreezePageRelminMxid = multi; if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) freeze = true; @@ -7359,8 +7405,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, { xid = members[i].xid; Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7374,9 +7420,9 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, if (TransactionIdIsNormal(xid)) { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - /* heap_prepare_freeze_tuple always freezes xvac */ + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; + /* heap_prepare_freeze_tuple forces xvac freezing */ freeze = true; } } diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 91c5f5e9e..e334ee8dc 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -21,6 +21,7 @@ #include "access/xlog.h" #include "access/xloginsert.h" #include "catalog/catalog.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" @@ -205,9 +206,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer) { int ndeleted, nnewlpdead; + bool fpi; ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, - limited_ts, &nnewlpdead, NULL); + limited_ts, &nnewlpdead, &fpi, NULL); /* * Report the number of tuples reclaimed to pgstats. This is @@ -255,7 +257,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * InvalidTransactionId/0 respectively. * * Sets *nnewlpdead for caller, indicating the number of items that were - * newly set LP_DEAD during prune operation. + * newly set LP_DEAD during prune operation. Also sets *prune_fpi for + * caller, indicating if pruning generated a full-page image as torn page + * protection. * * off_loc is the offset location required by the caller to use in error * callback. @@ -267,7 +271,7 @@ heap_page_prune(Relation relation, Buffer buffer, GlobalVisState *vistest, TransactionId old_snap_xmin, TimestampTz old_snap_ts, - int *nnewlpdead, + int *nnewlpdead, bool *prune_fpi, OffsetNumber *off_loc) { int ndeleted = 0; @@ -380,6 +384,8 @@ heap_page_prune(Relation relation, Buffer buffer, if (off_loc) *off_loc = InvalidOffsetNumber; + *prune_fpi = false; /* for now */ + /* Any error while applying the changes is critical */ START_CRIT_SECTION(); @@ -417,6 +423,7 @@ heap_page_prune(Relation relation, Buffer buffer, { xl_heap_prune xlrec; XLogRecPtr recptr; + int64 wal_fpi_before = pgWalUsage.wal_fpi; xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon; xlrec.nredirected = prstate.nredirected; @@ -448,6 +455,9 @@ heap_page_prune(Relation relation, Buffer buffer, recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE); PageSetLSN(BufferGetPage(buffer), recptr); + + if (wal_fpi_before != pgWalUsage.wal_fpi) + *prune_fpi = true; } } else diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 98ccb9882..5bd35fbd4 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1525,8 +1525,9 @@ lazy_scan_prune(LVRelState *vacrel, live_tuples, recently_dead_tuples; int nnewlpdead; - TransactionId NewRelfrozenXid; - MultiXactId NewRelminMxid; + bool prune_fpi; + HeapPageFreeze pagefrz; + bool freeze_all_eligible PG_USED_FOR_ASSERTS_ONLY; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; HeapTupleFreeze frozen[MaxHeapTuplesPerPage]; @@ -1542,8 +1543,11 @@ lazy_scan_prune(LVRelState *vacrel, retry: /* Initialize (or reset) page-level state */ - NewRelfrozenXid = vacrel->NewRelfrozenXid; - NewRelminMxid = vacrel->NewRelminMxid; + pagefrz.freeze_required = false; + pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid; + pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid; tuples_deleted = 0; tuples_frozen = 0; lpdead_items = 0; @@ -1561,7 +1565,7 @@ retry: */ tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest, InvalidTransactionId, 0, &nnewlpdead, - &vacrel->offnum); + &prune_fpi, &vacrel->offnum); /* * Now scan the page to collect LP_DEAD items and check for tuples @@ -1596,27 +1600,23 @@ retry: continue; } - /* - * LP_DEAD items are processed outside of the loop. - * - * Note that we deliberately don't set hastup=true in the case of an - * LP_DEAD item here, which is not how count_nondeletable_pages() does - * it -- it only considers pages empty/truncatable when they have no - * items at all (except LP_UNUSED items). - * - * Our assumption is that any LP_DEAD items we encounter here will - * become LP_UNUSED inside lazy_vacuum_heap_page() before we actually - * call count_nondeletable_pages(). In any case our opinion of - * whether or not a page 'hastup' (which is how our caller sets its - * vacrel->nonempty_pages value) is inherently race-prone. It must be - * treated as advisory/unreliable, so we might as well be slightly - * optimistic. - */ if (ItemIdIsDead(itemid)) { + /* + * Delay unsetting all_visible until after we have decided on + * whether this page should be frozen. We need to test "is this + * page all_visible, assuming any LP_DEAD items are set LP_UNUSED + * in final heap pass?" to reach a decision. all_visible will be + * unset before we return, as required by lazy_scan_heap caller. + * + * Deliberately don't set hastup for LP_DEAD items. We make the + * soft assumption that any LP_DEAD items encountered here will + * become LP_UNUSED later on, before count_nondeletable_pages is + * reached. Whether the page 'hastup' is inherently race-prone. + * It must be treated as unreliable by caller anyway, so we might + * as well be slightly optimistic about it. + */ deadoffsets[lpdead_items++] = offnum; - prunestate->all_visible = false; - prunestate->has_lpdead_items = true; continue; } @@ -1743,9 +1743,8 @@ retry: prunestate->hastup = true; /* page makes rel truncation unsafe */ /* Tuple with storage -- consider need to freeze */ - if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, - &frozen[tuples_frozen], &totally_frozen, - &NewRelfrozenXid, &NewRelminMxid)) + if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz, + &frozen[tuples_frozen], &totally_frozen)) { /* Save prepared freeze plan for later */ frozen[tuples_frozen++].offset = offnum; @@ -1766,23 +1765,69 @@ retry: * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). + * + * Freeze the page when heap_prepare_freeze_tuple indicates that at least + * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also + * freeze when pruning generated an FPI, if doing so means that we set the + * page all-frozen afterwards (this could happen during second heap pass). */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + if (pagefrz.freeze_required || tuples_frozen == 0 || + (prunestate->all_visible && prunestate->all_frozen && prune_fpi)) + { + /* + * We're freezing the page. Our final NewRelfrozenXid doesn't need to + * be affected by the XIDs that are just about to be frozen anyway. + * + * Note: although we're freezing all eligible tuples on this page, we + * might not need any freeze plans to do so (pruning might be enough). + * We always assume that a call to heap_prepare_freeze_tuple that had + * to ratchet back the "freeze" NewRelfrozenXid/NewRelminMxid trackers + * might have taken place earlier, though; having zero freeze plans + * does not indicate that it's safe to skip this step. + */ + vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid; + vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid; + freeze_all_eligible = true; + } + else + { + /* NewRelfrozenXid <= all XIDs in tuples that weren't pruned away */ + vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid; + vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid; + + /* Might still set page all-visible, but never all-frozen */ + tuples_frozen = 0; + freeze_all_eligible = prunestate->all_frozen = false; + } /* * Consider the need to freeze any items with tuple storage from the page - * first (arbitrary) */ if (tuples_frozen > 0) { - Assert(prunestate->hastup); + TransactionId snapshotConflictHorizon; + + Assert(prunestate->hastup && freeze_all_eligible); vacrel->frozen_pages++; + /* + * We can use the latest xmin cutoff (which is generally used for 'VM + * set' conflicts) as our cutoff for freeze conflicts when the whole + * page is eligible to become all-frozen in the VM once frozen by us. + * Otherwise use a conservative cutoff (just back up from OldestXmin). + */ + if (prunestate->all_visible && prunestate->all_frozen) + snapshotConflictHorizon = prunestate->visibility_cutoff_xid; + else + { + snapshotConflictHorizon = vacrel->cutoffs.OldestXmin; + TransactionIdRetreat(snapshotConflictHorizon); + } + /* Execute all freeze plans for page as a single atomic action */ heap_freeze_execute_prepared(vacrel->rel, buf, - vacrel->cutoffs.FreezeLimit, + snapshotConflictHorizon, frozen, tuples_frozen); } @@ -1801,7 +1846,7 @@ retry: */ #ifdef USE_ASSERT_CHECKING /* Note that all_frozen value does not matter when !all_visible */ - if (prunestate->all_visible) + if (prunestate->all_visible && lpdead_items == 0) { TransactionId cutoff; bool all_frozen; @@ -1809,8 +1854,7 @@ retry: if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen)) Assert(false); - Assert(lpdead_items == 0); - Assert(prunestate->all_frozen == all_frozen); + Assert(prunestate->all_frozen == all_frozen || !freeze_all_eligible); /* * It's possible that we froze tuples and made the page's XID cutoff @@ -1831,9 +1875,6 @@ retry: VacDeadItems *dead_items = vacrel->dead_items; ItemPointerData tmp; - Assert(!prunestate->all_visible); - Assert(prunestate->has_lpdead_items); - vacrel->lpdead_item_pages++; ItemPointerSetBlockNumber(&tmp, blkno); @@ -1847,6 +1888,10 @@ retry: Assert(dead_items->num_items <= dead_items->max_items); pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES, dead_items->num_items); + + /* lazy_scan_heap caller expects LP_DEAD item to unset all_visible */ + prunestate->all_visible = false; + prunestate->has_lpdead_items = true; } /* Finally, add page-local counts to whole-VACUUM counts */ @@ -1891,8 +1936,8 @@ lazy_scan_noprune(LVRelState *vacrel, recently_dead_tuples, missed_dead_tuples; HeapTupleHeader tupleheader; - TransactionId NewRelfrozenXid = vacrel->NewRelfrozenXid; - MultiXactId NewRelminMxid = vacrel->NewRelminMxid; + TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1937,8 +1982,9 @@ lazy_scan_noprune(LVRelState *vacrel, *hastup = true; /* page prevents rel truncation */ tupleheader = (HeapTupleHeader) PageGetItem(page, itemid); - if (heap_tuple_would_freeze(tupleheader, &vacrel->cutoffs, - &NewRelfrozenXid, &NewRelminMxid)) + if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs, + &NoFreezePageRelfrozenXid, + &NoFreezePageRelminMxid)) { /* Tuple with XID < FreezeLimit (or MXID < MultiXactCutoff) */ if (vacrel->aggressive) @@ -2019,8 +2065,8 @@ lazy_scan_noprune(LVRelState *vacrel, * this particular page until the next VACUUM. Remember its details now. * (lazy_scan_prune expects a clean slate, so we have to do this last.) */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid; + vacrel->NewRelminMxid = NoFreezePageRelminMxid; /* Save any LP_DEAD items found on the page in dead_items array */ if (vacrel->nindexes == 0) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 9eedab652..44e15b5fb 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9194,9 +9194,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; - Specifies the cutoff age (in transactions) that VACUUM - should use to decide whether to freeze row versions - while scanning a table. + Specifies the cutoff age (in transactions) that + VACUUM should use to decide whether to + trigger freezing of pages that have an older XID. The default is 50 million transactions. Although users can set this value anywhere from zero to one billion, VACUUM will silently limit the effective value to half @@ -9274,9 +9274,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Specifies the cutoff age (in multixacts) that VACUUM - should use to decide whether to replace multixact IDs with a newer - transaction ID or multixact ID while scanning a table. The default - is 5 million multixacts. + should use to decide whether to trigger freezing of pages with + an older multixact ID. The default is 5 million multixacts. Although users can set this value anywhere from zero to one billion, VACUUM will silently limit the effective value to half the value of , -- 2.38.1