From 43ab00609392ed7ad31be491834bdac348e13653 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Fri, 11 Mar 2022 19:16:02 -0800 Subject: [PATCH v10 3/3] Make page-level characteristics drive freezing. Teach VACUUM to freeze all of the tuples on a page whenever it notices that it would otherwise mark the page all-visible, without also marking it all-frozen. VACUUM typically won't freeze _any_ tuples on the page unless _all_ tuples (that remain after pruning) are all-visible. This makes the overhead of vacuuming much more predictable over time. We avoid the need for large balloon payments during aggressive VACUUMs (typically anti-wraparound autovacuums). Freezing is proactive, so we're much less likely to get into "freezing debt". The new approach to freezing also enables relfrozenxid advancement in non-aggressive VACUUMs, which might be enough to avoid aggressive VACUUMs altogether (with many individual tables/workloads). While the non-aggressive case continues to skip all-visible (but not all-frozen) pages, that will no longer hinder relfrozenxid advancement (outside of pg_upgrade scenarios). We now try to avoid leaving behind all-visible (not all-frozen) pages. This (as well as work from commit 44fa84881f) makes relfrozenxid advancement in non-aggressive VACUUMs commonplace. There is also a clear disadvantage to the new approach to freezing: more eager freezing will impose overhead on cases that don't receive any benefit. This is considered an acceptable trade-off. The new algorithm tends to avoid freezing early on pages where it makes the least sense, since frequently modified pages are unlikely to be all-visible. The system accumulates freezing debt in proportion to the number of physical heap pages with unfrozen tuples, more or less. Anything based on XID age is likely to be a poor proxy for the eventual cost of freezing (during the inevitable anti-wraparound autovacuum). At a high level, freezing is now treated as one of the costs of storing tuples in physical heap pages -- not a cost of transactions that allocate XIDs. Although vacuum_freeze_min_age and vacuum_multixact_freeze_min_age still influence what we freeze, and when, they seldom have much influence in many important cases. It may still be necessary to "freeze a page" due to the presence of a particularly old XID, from before VACUUM's FreezeLimit cutoff. FreezeLimit can only trigger page-level freezing, though -- it cannot change how freezing is actually executed. All XIDs < OldestXmin and all MXIDs < OldestMxact will now be frozen on any page that VACUUM decides to freeze, regardless of the details behind its decision. Author: Peter Geoghegan Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com --- src/include/access/heapam_xlog.h | 7 +- src/backend/access/heap/heapam.c | 92 +++++++++++++++++---- src/backend/access/heap/vacuumlazy.c | 116 ++++++++++++++++++--------- src/backend/commands/vacuum.c | 8 ++ doc/src/sgml/maintenance.sgml | 9 +-- 5 files changed, 172 insertions(+), 60 deletions(-) diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 2d8a7f627..2c25e72b2 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -409,10 +409,15 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, + TransactionId limit_xid, + MultiXactId limit_multi, xl_heap_freeze_tuple *frz, bool *totally_frozen, + bool *force_freeze, TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); + MultiXactId *relminmxid_out, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out); extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *xlrec_tp); extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer, diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 2e859e427..3454201f3 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6446,14 +6446,38 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * are older than the specified cutoff XID and cutoff MultiXactId. If so, * setup enough state (in the *frz output argument) to later execute and * WAL-log what we would need to do, and return true. Return false if nothing - * is to be changed. In addition, set *totally_frozen to true if the tuple + * can be changed. In addition, set *totally_frozen to true if the tuple * will be totally frozen after these operations are performed and false if * more freezing will eventually be required. * + * Although this interface is primarily tuple-based, vacuumlazy.c caller + * cooperates with us to decide on whether or not to freeze whole pages, + * together as a single group. We prepare for freezing at the level of each + * tuple, but the final decision is made for the page as a whole. All pages + * that are frozen within a given VACUUM operation are frozen according to + * cutoff_xid and cutoff_multi. Caller _must_ freeze the whole page when + * we've set *force_freeze to true! + * + * cutoff_xid must be caller's oldest xmin to ensure that any XID older than + * it could neither be running nor seen as running by any open transaction. + * This ensures that the replacement will not change anyone's idea of the + * tuple state. Similarly, cutoff_multi must be the smallest MultiXactId used + * by any open transaction (at the time that the oldest xmin was acquired). + * + * limit_xid must be <= cutoff_xid, and limit_multi must be <= cutoff_multi. + * When any XID/XMID from before these secondary cutoffs are encountered, we + * set *force_freeze to true, making caller freeze the page (freezing-eligible + * XIDs/XMIDs will be frozen, at least). Forcing freezing like this ensures + * that VACUUM won't allow XIDs/XMIDs to ever get too old. This shouldn't be + * necessary very often. VACUUM should prefer to freeze when it's cheap (not + * when it's urgent). + * * Maintains *relfrozenxid_out and *relminmxid_out, which are the current - * target relfrozenxid and relminmxid for the relation. Caller should make - * temp copies of global tracking variables before starting to process a page, - * so that we can only scribble on copies. + * target relfrozenxid and relminmxid for the relation. There are also "no + * freeze" variants (*relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out) + * that are used by caller when it decides to not freeze the page. Caller + * should make temp copies of global tracking variables before starting to + * process a page, so that we can only scribble on copies. * * Caller is responsible for setting the offset field, if appropriate. * @@ -6461,13 +6485,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD * (else we should be removing the tuple, not freezing it). * - * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any - * XID older than it could neither be running nor seen as running by any - * open transaction. This ensures that the replacement will not change - * anyone's idea of the tuple state. - * Similarly, cutoff_multi must be less than or equal to the smallest - * MultiXactId used by any transaction currently open. - * * If the tuple is in a shared buffer, caller must hold an exclusive lock on * that buffer. * @@ -6479,11 +6496,16 @@ bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, - xl_heap_freeze_tuple *frz, bool *totally_frozen, + TransactionId limit_xid, MultiXactId limit_multi, + xl_heap_freeze_tuple *frz, + bool *totally_frozen, bool *force_freeze, TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) + MultiXactId *relminmxid_out, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out) { bool changed = false; + bool xmin_already_frozen = false; bool xmax_already_frozen = false; bool xmin_frozen; bool freeze_xmax; @@ -6504,7 +6526,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, */ xid = HeapTupleHeaderGetXmin(tuple); if (!TransactionIdIsNormal(xid)) + { + xmin_already_frozen = true; xmin_frozen = true; + } else { if (TransactionIdPrecedes(xid, relfrozenxid)) @@ -6534,7 +6559,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * resolve a MultiXactId to its member Xids, in case some of them are * below the given cutoff for Xids. In that case, those values might need * freezing, too. Also, if a multi needs freezing, we cannot simply take - * it out --- if there's a live updater Xid, it needs to be kept. + * it out --- if there's a live updater Xid, it needs to be kept. If we + * need to allocate a new MultiXact for that purposes, we will force + * caller to freeze the page. * * Make sure to keep heap_tuple_needs_freeze in sync with this. */ @@ -6580,6 +6607,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, Assert(TransactionIdIsValid(newxmax)); if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) *relfrozenxid_out = newxmax; + + /* + * We have an opportunity to get rid of this MultiXact now, so + * force freezing to avoid wasting it + */ + *force_freeze = true; } else if (flags & FRM_RETURN_IS_MULTI) { @@ -6616,6 +6649,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, Assert(TransactionIdPrecedesOrEquals(xmax_oldest_xid_out, *relfrozenxid_out)); *relfrozenxid_out = xmax_oldest_xid_out; + + /* + * We allocated a MultiXact for this, so force freezing to avoid + * wasting it + */ + *force_freeze = true; } else if (flags & FRM_NOOP) { @@ -6734,11 +6773,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); frz->t_infomask |= HEAP_XMIN_COMMITTED; changed = true; + + /* Seems like a good idea to freeze early when this case is hit */ + *force_freeze = true; } } *totally_frozen = (xmin_frozen && (freeze_xmax || xmax_already_frozen)); + + /* + * Maintain alternative versions of relfrozenxid_out/relminmxid_out that + * leave caller with the option of *not* freezing the page. If caller has + * already lost that option (e.g. when the page has an old XID that we + * must force caller to freeze), then we don't waste time on this. + */ + if (!*force_freeze && (!xmin_already_frozen || !xmax_already_frozen)) + *force_freeze = heap_tuple_needs_freeze(tuple, + limit_xid, limit_multi, + relfrozenxid_nofreeze_out, + relminmxid_nofreeze_out); + return changed; } @@ -6790,15 +6845,22 @@ heap_freeze_tuple(HeapTupleHeader tuple, { xl_heap_freeze_tuple frz; bool do_freeze; + bool force_freeze = true; bool tuple_totally_frozen; TransactionId relfrozenxid_out = cutoff_xid; MultiXactId relminmxid_out = cutoff_multi; + TransactionId relfrozenxid_nofreeze_out = cutoff_xid; + MultiXactId relminmxid_nofreeze_out = cutoff_multi; do_freeze = heap_prepare_freeze_tuple(tuple, relfrozenxid, relminmxid, cutoff_xid, cutoff_multi, + cutoff_xid, cutoff_multi, &frz, &tuple_totally_frozen, - &relfrozenxid_out, &relminmxid_out); + &force_freeze, + &relfrozenxid_out, &relminmxid_out, + &relfrozenxid_nofreeze_out, + &relminmxid_nofreeze_out); /* * Note that because this is not a WAL-logged operation, we don't need to diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 3bc75d401..7e2d03ba6 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -169,8 +169,9 @@ typedef struct LVRelState /* VACUUM operation's cutoffs for freezing and pruning */ TransactionId OldestXmin; + MultiXactId OldestMxact; GlobalVisState *vistest; - /* VACUUM operation's target cutoffs for freezing XIDs and MultiXactIds */ + /* Limits on the age of the oldest unfrozen XID and MXID */ TransactionId FreezeLimit; MultiXactId MultiXactCutoff; /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */ @@ -199,6 +200,7 @@ typedef struct LVRelState BlockNumber rel_pages; /* total number of pages */ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ BlockNumber removed_pages; /* # pages removed by relation truncation */ + BlockNumber newly_frozen_pages; /* # pages frozen by lazy_scan_prune */ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ @@ -477,6 +479,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, /* Initialize page counters explicitly (be tidy) */ vacrel->scanned_pages = 0; vacrel->removed_pages = 0; + vacrel->newly_frozen_pages = 0; vacrel->lpdead_item_pages = 0; vacrel->missed_dead_pages = 0; vacrel->nonempty_pages = 0; @@ -514,10 +517,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, */ vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel); vacrel->OldestXmin = OldestXmin; + vacrel->OldestMxact = OldestMxact; vacrel->vistest = GlobalVisTestFor(rel); - /* FreezeLimit controls XID freezing (always <= OldestXmin) */ + /* FreezeLimit limits unfrozen XID age (always <= OldestXmin) */ vacrel->FreezeLimit = FreezeLimit; - /* MultiXactCutoff controls MXID freezing (always <= OldestMxact) */ + /* MultiXactCutoff limits unfrozen MXID age (always <= OldestMxact) */ vacrel->MultiXactCutoff = MultiXactCutoff; /* Initialize state used to track oldest extant XID/XMID */ vacrel->NewRelfrozenXid = OldestXmin; @@ -583,7 +587,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, */ if (vacrel->skippedallvis) { - /* Cannot advance relfrozenxid/relminmxid */ + /* + * Skipped some all-visible pages, so definitely cannot advance + * relfrozenxid. This is generally only expected in pg_upgrade + * scenarios, since VACUUM now avoids setting a page to all-visible + * but not all-frozen. However, it's also possible (though quite + * unlikely) that we ended up here because somebody else cleared some + * page's all-frozen flag (without clearing its all-visible flag). + */ Assert(!aggressive); frozenxid_updated = minmulti_updated = false; vac_update_relstats(rel, new_rel_pages, new_live_tuples, @@ -685,9 +696,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); - appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), + appendStringInfo(&buf, _("pages: %u removed, %u remain, %u frozen, %u scanned (%.2f%% of total)\n"), vacrel->removed_pages, vacrel->rel_pages, + vacrel->newly_frozen_pages, vacrel->scanned_pages, orig_rel_pages == 0 ? 100.0 : 100.0 * vacrel->scanned_pages / orig_rel_pages); @@ -1613,8 +1625,11 @@ lazy_scan_prune(LVRelState *vacrel, recently_dead_tuples; int nnewlpdead; int nfrozen; - TransactionId NewRelfrozenXid; - MultiXactId NewRelminMxid; + bool force_freeze = false; + TransactionId NewRelfrozenXid, + NoFreezeNewRelfrozenXid; + MultiXactId NewRelminMxid, + NoFreezeNewRelminMxid; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage]; @@ -1625,8 +1640,8 @@ lazy_scan_prune(LVRelState *vacrel, retry: /* Initialize (or reset) page-level state */ - NewRelfrozenXid = vacrel->NewRelfrozenXid; - NewRelminMxid = vacrel->NewRelminMxid; + NewRelfrozenXid = NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid; + NewRelminMxid = NoFreezeNewRelminMxid = vacrel->NewRelminMxid; tuples_deleted = 0; lpdead_items = 0; live_tuples = 0; @@ -1679,27 +1694,23 @@ retry: continue; } - /* - * LP_DEAD items are processed outside of the loop. - * - * Note that we deliberately don't set hastup=true in the case of an - * LP_DEAD item here, which is not how count_nondeletable_pages() does - * it -- it only considers pages empty/truncatable when they have no - * items at all (except LP_UNUSED items). - * - * Our assumption is that any LP_DEAD items we encounter here will - * become LP_UNUSED inside lazy_vacuum_heap_page() before we actually - * call count_nondeletable_pages(). In any case our opinion of - * whether or not a page 'hastup' (which is how our caller sets its - * vacrel->nonempty_pages value) is inherently race-prone. It must be - * treated as advisory/unreliable, so we might as well be slightly - * optimistic. - */ if (ItemIdIsDead(itemid)) { + /* + * Delay unsetting all_visible until after we have decided on + * whether this page should be frozen. We need to test "is this + * page all_visible, assuming any LP_DEAD items are set LP_UNUSED + * in final heap pass?" to reach a decision. all_visible will be + * unset before we return, as required by lazy_scan_heap caller. + * + * Deliberately don't set hastup for LP_DEAD items. We make the + * soft assumption that any LP_DEAD items encountered here will + * become LP_UNUSED later on, before count_nondeletable_pages is + * reached. Whether the page 'hastup' is inherently race-prone. + * It must be treated as unreliable by caller anyway, so we might + * as well be slightly optimistic about it. + */ deadoffsets[lpdead_items++] = offnum; - prunestate->all_visible = false; - prunestate->has_lpdead_items = true; continue; } @@ -1831,11 +1842,15 @@ retry: if (heap_prepare_freeze_tuple(tuple.t_data, vacrel->relfrozenxid, vacrel->relminmxid, + vacrel->OldestXmin, + vacrel->OldestMxact, vacrel->FreezeLimit, vacrel->MultiXactCutoff, &frozen[nfrozen], - &tuple_totally_frozen, - &NewRelfrozenXid, &NewRelminMxid)) + &tuple_totally_frozen, &force_freeze, + &NewRelfrozenXid, &NewRelminMxid, + &NoFreezeNewRelfrozenXid, + &NoFreezeNewRelminMxid)) { /* Will execute freeze below */ frozen[nfrozen++].offset = offnum; @@ -1856,9 +1871,32 @@ retry: * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). + * + * Freeze the page when it is about to become all-visible (either just + * after we return control to lazy_scan_heap, or later on, during the + * final heap pass). Also freeze when heap_prepare_freeze_tuple forces us + * to freeze (this is mandatory). Freezing is typically forced because + * there is at least one XID/XMID from before FreezeLimit/MultiXactCutoff. */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + if (prunestate->all_visible || force_freeze) + { + /* + * We're freezing the page. Our final NewRelfrozenXid doesn't need to + * be affected by the XIDs/XMIDs that are just about to be frozen + * anyway. + */ + vacrel->NewRelfrozenXid = NewRelfrozenXid; + vacrel->NewRelminMxid = NewRelminMxid; + } + else + { + /* This is comparable to lazy_scan_noprune's handling */ + vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid; + vacrel->NewRelminMxid = NoFreezeNewRelminMxid; + + /* Forget heap_prepare_freeze_tuple's guidance on freezing */ + nfrozen = 0; + } /* * Consider the need to freeze any items with tuple storage from the page @@ -1866,7 +1904,7 @@ retry: */ if (nfrozen > 0) { - Assert(prunestate->hastup); + vacrel->newly_frozen_pages++; /* * At least one tuple with storage needs to be frozen -- execute that @@ -1892,11 +1930,11 @@ retry: } /* Now WAL-log freezing if necessary */ - if (RelationNeedsWAL(vacrel->rel)) + if (RelationNeedsWAL(rel)) { XLogRecPtr recptr; - recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit, + recptr = log_heap_freeze(rel, buf, NewRelfrozenXid, frozen, nfrozen); PageSetLSN(page, recptr); } @@ -1919,7 +1957,7 @@ retry: */ #ifdef USE_ASSERT_CHECKING /* Note that all_frozen value does not matter when !all_visible */ - if (prunestate->all_visible) + if (prunestate->all_visible && lpdead_items == 0) { TransactionId cutoff; bool all_frozen; @@ -1927,7 +1965,6 @@ retry: if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen)) Assert(false); - Assert(lpdead_items == 0); Assert(prunestate->all_frozen == all_frozen); /* @@ -1949,9 +1986,6 @@ retry: VacDeadItems *dead_items = vacrel->dead_items; ItemPointerData tmp; - Assert(!prunestate->all_visible); - Assert(prunestate->has_lpdead_items); - vacrel->lpdead_item_pages++; ItemPointerSetBlockNumber(&tmp, blkno); @@ -1965,6 +1999,10 @@ retry: Assert(dead_items->num_items <= dead_items->max_items); pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES, dead_items->num_items); + + /* lazy_scan_heap caller expects LP_DEAD item to unset all_visible */ + prunestate->has_lpdead_items = true; + prunestate->all_visible = false; } /* Finally, add page-local counts to whole-VACUUM counts */ diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 0ae3b4506..f1ea50454 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -957,6 +957,14 @@ get_all_vacuum_rels(int options) * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a * minimum). * + * While non-aggressive VACUUMs are never required to advance relfrozenxid and + * relminmxid, they often do so in practice. They freeze wherever possible, + * based on the same criteria that aggressive VACUUMs use. FreezeLimit and + * multiXactCutoff still force freezing of older XIDs/XMIDs that did not get + * frozen based on the standard criteria, though. (Actually, these cutoffs + * won't force non-aggressive VACUUMs to freeze pages that cannot be cleanup + * locked without waiting.) + * * oldestXmin and oldestMxact are the most recent values that can ever be * passed to vac_update_relstats() as frozenxid and minmulti arguments by our * vacuumlazy.c caller later on. These values should be passed when it turns diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 6a02d0fa8..4d585a265 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -565,11 +565,10 @@ the relfrozenxid column of a table's pg_class row contains the oldest remaining XID at the end of the most recent VACUUM - that successfully advanced relfrozenxid - (typically the most recent aggressive VACUUM). All rows inserted - by transactions with XIDs older than this cutoff XID are - guaranteed to have been frozen. Similarly, - the datfrozenxid column of a database's + that successfully advanced relfrozenxid. + All rows inserted by transactions with XIDs older than this cutoff + XID are guaranteed to have been frozen. Similarly, the + datfrozenxid column of a database's pg_database row is a lower bound on the unfrozen XIDs appearing in that database — it is just the minimum of the per-table relfrozenxid values within the database. -- 2.30.2