From 27edfce6bc6414e79d79741248cfe5eae258e92a Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 22 Apr 2023 13:04:13 -0700 Subject: [PATCH v3 9/9] Overhaul freezing and wraparound docs. This is almost a complete rewrite. "Preventing Transaction ID Wraparound Failures" becomes "Freezing to manage the transaction ID space". This is follow-up work to commit 1de58df4, which added page-level freezing to VACUUM. The emphasis is now on the physical work of freezing pages. This flows a little better than it otherwise would due to recent structural cleanups to maintenance.sgml; discussion about freezing now immediately follows discussion of cleanup of dead tuples. We still talk about the problem of the system activating xidStopLimit protections in the same section, but we use much less alarmist language about data corruption, and are no longer overly concerned about the very worst case. We don't rescind the recommendation that users recover from an xidStopLimit outage by using single user mode, though that seems like something we should aim to do in the near future. There is no longer a separate sect3 to discuss MultiXactId related issues. VACUUM now performs exactly the same processing steps when it freezes a page, independent of the trigger condition. Also describe the page-level freezing FPI optimization added by commit 1de58df4. This is expected to trigger the majority of all freezing with many types of workloads. --- doc/src/sgml/config.sgml | 20 +- doc/src/sgml/logicaldecoding.sgml | 2 +- doc/src/sgml/maintenance.sgml | 967 ++++++++++++++++------ doc/src/sgml/ref/create_table.sgml | 2 +- doc/src/sgml/ref/prepare_transaction.sgml | 2 +- doc/src/sgml/ref/vacuum.sgml | 6 +- doc/src/sgml/ref/vacuumdb.sgml | 4 +- doc/src/sgml/xact.sgml | 2 +- 8 files changed, 724 insertions(+), 281 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b56f073a9..a4ac4e740 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8359,7 +8359,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Note that even when this parameter is disabled, the system will launch autovacuum processes if necessary to prevent transaction ID wraparound. See for more information. + linkend="freezing-xid-space"/> for more information. @@ -8548,7 +8548,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; This parameter can only be set at server start, but the setting can be reduced for individual tables by changing table storage parameters. - For more information see . + For more information see . @@ -8577,7 +8577,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; 400 million multixacts. This parameter can only be set at server start, but the setting can be reduced for individual tables by changing table storage parameters. - For more information see . + For more information see . @@ -9284,7 +9284,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; periodic manual VACUUM has a chance to run before an anti-wraparound autovacuum is launched for the table. For more information see - . + . @@ -9306,7 +9306,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; the value of , so that there is not an unreasonably short time between forced autovacuums. For more information see . + linkend="freezing-xid-space"/>. @@ -9343,7 +9343,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; set this value anywhere from zero to 2.1 billion, VACUUM will silently adjust the effective value to no less than 105% of . + linkend="guc-autovacuum-freeze-max-age"/>. For more + information see . @@ -9367,7 +9368,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; , so that a periodic manual VACUUM has a chance to run before an anti-wraparound is launched for the table. - For more information see . + For more information see . @@ -9388,7 +9389,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; the value of , so that there is not an unreasonably short time between forced autovacuums. - For more information see . + For more information see . @@ -9421,7 +9422,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; this value anywhere from zero to 2.1 billion, VACUUM will silently adjust the effective value to no less than 105% of . + linkend="guc-autovacuum-multixact-freeze-max-age"/>. For more + information see . diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index cbd3aa804..80dade3be 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -353,7 +353,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU because neither required WAL nor required rows from the system catalogs can be removed by VACUUM as long as they are required by a replication slot. In extreme cases this could cause the database to shut down to prevent - transaction ID wraparound (see ). + transaction ID wraparound (see ). So if a slot is no longer required it should be dropped. diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 5546d8c7d..a480e4f8e 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -148,13 +148,8 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac . Such vacuums may allow portions of the table to be marked as all visible and also allow tuples to be frozen, which - can reduce the work required in subsequent vacuums. - For tables which receive INSERT operations but no or - almost no UPDATE/DELETE operations, - it may be beneficial to lower the table's - as this may allow - tuples to be frozen by earlier vacuums. The number of obsolete tuples and - the number of inserted tuples are obtained from the cumulative statistics system; + can reduce the work required in subsequent vacuums. The number of obsolete tuples + and the number of inserted tuples are obtained from the cumulative statistics system; it is a semi-accurate count updated by each UPDATE, DELETE and INSERT operation. (It is only semi-accurate because some information might be lost under heavy @@ -273,15 +268,20 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - To protect against loss of very old data due to - transaction ID wraparound or - multixact ID wraparound. + To maintain the system's ability to allocated new + transaction IDs through freezing. To update the visibility map, which speeds up index-only - scans. + scans, and helps the next VACUUM + operation avoid needlessly scanning already-frozen pages. + + + + To truncate obsolescent transaction status information, + when possible. @@ -483,302 +483,671 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - - Preventing Transaction ID Wraparound Failures - - - transaction ID - wraparound - + + Freezing to manage the transaction ID space - wraparound - of transaction IDs + Freezing + of transaction IDs and MultiXact IDs - PostgreSQL's MVCC transaction semantics depend on - being able to compare transaction - ID numbers (XID) to determine - whether or not the row is visible to each query's MVCC snapshot - (see ). But since - on-disk storage of transaction IDs in heap pages uses a truncated - 32-bit representation to save space (rather than the full 64-bit - representation), it is necessary to vacuum every table in every - database at least once every two billion - transactions (though far more frequent vacuuming is typical). + VACUUM often marks some of the pages that it + scans frozen, indicating that all eligible + rows on the page were inserted by a transaction that committed + sufficiently far in the past that the effects of the inserting + transaction are certain to be visible to all current and future + transactions. The specific Transaction ID number + (XID) stored in a frozen heap row's + xmin field is no longer needed to + determine anything about the row's visibility. Furthermore, when + a row undergoing freezing happens to have an XID set in its + xmax field (possibly an XID left behind + by an earlier SELECT FOR UPDATE row locker), + the xmax field's XID is usually also + removed. - - controls how old an XID value has to be before rows bearing that XID will be - frozen. Increasing this setting may avoid unnecessary work if the - rows that would otherwise be frozen will soon be modified again, - but decreasing this setting increases - the number of transactions that can elapse before the table must be - vacuumed again. + Once frozen, heap pages are self-contained. Every + query can read all of the page's rows in a way that assumes that + the inserting transaction committed and is visible to its + MVCC snapshot. No query will ever have to + consult external transaction status metadata to interpret the + page's contents, either. In particular, + pg_xact transaction XID commit/abort status + lookups won't take place during query execution. - VACUUM uses the visibility map - to determine which pages of a table must be scanned. Normally, it - will skip pages that don't have any dead row versions even if those pages - might still have row versions with old XID values. Therefore, normal - VACUUMs won't always freeze every old row version in the table. - When that happens, VACUUM will eventually need to perform an - aggressive vacuum, which will freeze all eligible unfrozen - XID and MXID values, including those from all-visible but not all-frozen pages. - In practice most tables require periodic aggressive vacuuming. - - controls when VACUUM does that: all-visible but not all-frozen - pages are scanned if the number of transactions that have passed since the - last such scan is greater than vacuum_freeze_table_age minus - vacuum_freeze_min_age. Setting - vacuum_freeze_table_age to 0 forces VACUUM to - always use its aggressive strategy. + Freezing is a WAL-logged operation, so when + VACUUM freezes a heap page, any copy of the + page located on a physical replication standby server will itself + be frozen shortly thereafter (when the relevant + FREEZE_PAGE WAL record is + replayed on the standby). Queries that run on physical + replication standbys thereby avoid pg_xact + lookups when reading from frozen pages, in just the same way as + queries that run on the primary server + + + In this regard freezing is unlike setting transaction status + hint bits in tuple headers: setting hint bits + doesn't usually need to be WAL-logged, and + can take place on physical replication standby servers without + the involvement of the primary server. The purpose of hint bits + is to avoid repeat pg_xact lookups for the + same tuples, strictly as an optimization. The purpose of + freezing (from the point of view of individual tuples) is to + reliably remove each tuple's dependency on + pg_xact, ultimately making it safe to + truncate pg_xact from time to time. + + . - The maximum time that a table can go unvacuumed is two billion - transactions minus the vacuum_freeze_min_age value at - the time of the last aggressive vacuum. If it were to go - unvacuumed for longer than - that, data loss could result. To ensure that this does not happen, - autovacuum is invoked on any table that might contain unfrozen rows with - XIDs older than the age specified by the configuration parameter . (This will happen even if - autovacuum is disabled.) + It can be useful for VACUUM to put off some of + the work of freezing, but VACUUM cannot put off + freezing forever. Since on-disk storage of transaction IDs in + heap row headers uses a truncated 32-bit representation to save + space (rather than the full 64-bit representation), freezing plays + a crucial role in enabling management of the XID address + space by VACUUM. If, for whatever + reason, VACUUM is unable to freeze older XIDs + on behalf of an application that continues to require new XID + allocations, the system will eventually + refuse to allocate new transaction IDs. + The system generally only enters this state when autovacuum is + misconfigured. - This implies that if a table is not otherwise vacuumed, - autovacuum will be invoked on it approximately once every - autovacuum_freeze_max_age minus - vacuum_freeze_min_age transactions. - For tables that are regularly vacuumed for space reclamation purposes, - this is of little importance. However, for static tables - (including tables that receive inserts, but no updates or deletes), - there is no need to vacuum for space reclamation, so it can - be useful to try to maximize the interval between forced autovacuums - on very large static tables. Obviously one can do this either by - increasing autovacuum_freeze_max_age or decreasing - vacuum_freeze_min_age. + controls when freezing + takes place. When VACUUM scans a heap page + containing even one XID that has already attained an age exceeding + this value, the page is frozen. + + + + MultiXact ID + Freezing of + + + + MultiXact IDs are used to support row + locking by multiple transactions. Since there is only limited + space in a tuple header to store lock information, that + information is encoded as a multiple transaction + ID, or MultiXact ID for short, whenever there is more + than one transaction concurrently locking a row. Information + about which transaction IDs are included in any particular + MultiXact ID is stored separately in + pg_multixact, and only the MultiXact ID + itself (a 32-bit unsigned integer) appears in the tuple's + xmax field. This creates a dependency + on external transaction status information similar to the + dependency that ordinary unfrozen XIDs have on commit status + information stored in pg_xact. + VACUUM must therefore occasionally remove + MultiXact IDs from tuples during freezing. - The effective maximum for vacuum_freeze_table_age is 0.95 * - autovacuum_freeze_max_age; a setting higher than that will be - capped to the maximum. A value higher than - autovacuum_freeze_max_age wouldn't make sense because an - anti-wraparound autovacuum would be triggered at that point anyway, and - the 0.95 multiplier leaves some breathing room to run a manual - VACUUM before that happens. As a rule of thumb, - vacuum_freeze_table_age should be set to a value somewhat - below autovacuum_freeze_max_age, leaving enough gap so that - a regularly scheduled VACUUM or an autovacuum triggered by - normal delete and update activity is run in that window. Setting it too - close could lead to anti-wraparound autovacuums, even though the table - was recently vacuumed to reclaim space, whereas lower values lead to more - frequent aggressive vacuuming. + also + controls when freezing takes place. It is analogous to + vacuum_freeze_min_age, but age + is expressed in units of MultiXact ID. + Lowering vacuum_multixact_freeze_min_age + forces VACUUM to process + xmax fields containing a MultiXact ID + in cases where it would otherwise opt to put off the work of + processing xmax until the next + VACUUM + + Freezing of xmax + fields (whether they were found to contain an XID or a MultiXact + ID) generally means clearing xmax. + VACUUM may occasionally encounter an + individual MultiXact ID that must be removed to advance + relminmxid by the required amount, + which can only be processed by generating a replacement + MultiXact ID (containing just the non-removable subset of member + XIDs from the original MultiXact ID), and then setting the + tuple's xmax to the new/replacement + MultiXact ID value. + + . The setting generally doesn't significantly + influence the total number of pages VACUUM + freezes, even in tables that contain relatively many MultiXact + IDs. This is because VACUUM generally prefers + proactively processing for most individual + xmax fields that contain a MultiXact ID + (eager proactive processing is typically cheaper). - The sole disadvantage of increasing autovacuum_freeze_max_age - (and vacuum_freeze_table_age along with it) is that - the pg_xact and pg_commit_ts - subdirectories of the database cluster will take more space, because it - must store the commit status and (if track_commit_timestamp is - enabled) timestamp of all transactions back to - the autovacuum_freeze_max_age horizon. The commit status uses - two bits per transaction, so if - autovacuum_freeze_max_age is set to its maximum allowed value - of two billion, pg_xact can be expected to grow to about half - a gigabyte and pg_commit_ts to about 20GB. If this - is trivial compared to your total database size, - setting autovacuum_freeze_max_age to its maximum allowed value - is recommended. Otherwise, set it depending on what you are willing to - allow for pg_xact and pg_commit_ts storage. - (The default, 200 million transactions, translates to about 50MB - of pg_xact storage and about 2GB of pg_commit_ts - storage.) + Managing the added WAL volume from freezing + over time is an important consideration for + VACUUM. It is why VACUUM + doesn't just freeze every eligible tuple at the earliest + opportunity: the WAL written to freeze a page's + tuples goes to waste in cases where the resulting + frozen tuples are soon deleted or updated anyway. It's also why + VACUUM will freeze all + eligible tuples from a heap page once the decision to freeze at + least one tuple is taken: at that point the added cost to freeze + all eligible tuples eagerly (measured in extra bytes of + WAL written) is far lower than the + probable cost of deferring freezing until a future + VACUUM operation against the same table. + Furthermore, once the page is frozen it can generally be marked as + all-frozen in the visibility map right away. - - One disadvantage of decreasing vacuum_freeze_min_age is that - it might cause VACUUM to do useless work: freezing a row - version is a waste of time if the row is modified - soon thereafter (causing it to acquire a new XID). So the setting should - be large enough that rows are not frozen until they are unlikely to change - any more. - + + + In PostgreSQL versions before 16, + VACUUM triggered freezing at the level of + individual xmin and + xmax fields. Freezing only affected + the exact XIDs that had already attained an age of + vacuum_freeze_min_age or greater. + + - To track the age of the oldest unfrozen XIDs in a database, - VACUUM stores XID - statistics in the system tables pg_class and - pg_database. In particular, - the relfrozenxid column of a table's - pg_class row contains the oldest remaining unfrozen - XID at the end of the most recent VACUUM that successfully - advanced relfrozenxid (typically the most recent - aggressive VACUUM). Similarly, the - datfrozenxid column of a database's - pg_database row is a lower bound on the unfrozen XIDs - appearing in that database — it is just the minimum of the - per-table relfrozenxid values within the database. - A convenient way to - examine this information is to execute queries such as: - - -SELECT c.oid::regclass as table_name, - greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age -FROM pg_class c -LEFT JOIN pg_class t ON c.reltoastrelid = t.oid -WHERE c.relkind IN ('r', 'm'); - -SELECT datname, age(datfrozenxid) FROM pg_database; - - - The age column measures the number of transactions from the - cutoff XID to the current transaction's XID. + VACUUM also triggers freezing of a page in + cases where it already proved necessary to write out a full page + image (FPI) as part of a WAL + record describing how dead tuples were removed + + Actually, the freeze on an FPI + write mechanism isn't just triggered whenever + VACUUM needed to write an + FPI for torn page protection as part of + writing a PRUNE WAL record + describing how dead tuples were removed. The + FPI mechanism can also be triggered when hint + bits are set by VACUUM, if and only if doing + so necessitates writing an FPI. + WAL-logging in order to set hint bits is only + possible when the option is + enabled in postgresql.conf, or when data + checksums were enabled when the cluster was initialized with + . + + (see for background + information about how FPIs provide torn page + protection). This freeze on an FPI + write batching mechanism often avoids the need for some + future VACUUM operation to write an additional + FPI for the same page as part of a + WAL record describing how live tuples were + frozen. In effect, VACUUM writes slightly more + WAL in the short term with the aim of + ultimately needing to write much less WAL in + the long term. - When the VACUUM command's VERBOSE - parameter is specified, VACUUM prints various - statistics about the table. This includes information about how - relfrozenxid and - relminmxid advanced, and the number of - newly frozen pages. The same details appear in the server log when - autovacuum logging (controlled by ) reports on a - VACUUM operation executed by autovacuum. + For tables which receive INSERT operations, + but few or no UPDATE/DELETE + operations, it may be beneficial to selectively lower for the table. + VACUUM may thereby be able to freeze the + table's pages eagerly during earlier autovacuums + triggered by . - - VACUUM normally only scans pages that have been modified - since the last vacuum, but relfrozenxid can only be - advanced when every page of the table - that might contain unfrozen XIDs is scanned. This happens when - relfrozenxid is more than - vacuum_freeze_table_age transactions old, when - VACUUM's FREEZE option is used, or when all - pages that are not already all-frozen happen to - require vacuuming to remove dead row versions. When VACUUM - scans every page in the table that is not already all-frozen, it should - set age(relfrozenxid) to a value just a little more than the - vacuum_freeze_min_age setting - that was used (more by the number of transactions started since the - VACUUM started). VACUUM - will set relfrozenxid to the oldest XID - that remains in the table, so it's possible that the final value - will be much more recent than strictly required. - If no relfrozenxid-advancing - VACUUM is issued on the table until - autovacuum_freeze_max_age is reached, an autovacuum will soon - be forced for the table. - + + + VACUUM may not be able to freeze every tuple's + xmin in relatively rare cases. The + criteria that determines basic eligibility for freezing is the + same as the one that determines if a deleted tuple can be + removed: the XID-based removable cutoff that + appears in the server log's autovacuum log reports (controlled by + ). + + + In extreme cases, a long-running transaction can hold back every + VACUUM's removable cutoff for so long that the + system is forced to activate xidStopLimit mode + protections. + + - - If for some reason autovacuum fails to clear old XIDs from a table, the - system will begin to emit warning messages like this when the database's - oldest XIDs reach forty million transactions from the wraparound point: + + Aggressive <command>VACUUM</command> + + + transaction ID + wraparound + + + + wraparound + of transaction IDs and MultiXact IDs + + + + As noted already, freezing doesn't just allow queries to avoid + lookups of subsidiary transaction status information in + structures such as pg_xact. Freezing also + plays a crucial role in enabling management of the XID address + space by VACUUM. VACUUM + maintains information about the oldest unfrozen XID that remains + in the table when it uses its aggressive strategy. + + + + Aggressive VACUUM will update the table's + pg_class.relfrozenxid + to the value that it determined to be the oldest remaining XID; + the table's relfrozenxid + advances by a certain number of XIDs. Aggressive + VACUUM may also need to update the + datfrozenxid column of the database's + pg_database row in turn. + datfrozenxid is a lower bound on the + unfrozen XIDs appearing in that database — it is just the + minimum of the per-table relfrozenxid + values (the relfrozenxid that has + attained the greatest age) within the database. + + + + Aggressive VACUUM also maintains the + pg_class.relminmxid + and pg_database.datminmxid + fields. These are needed to track the oldest MultiXact ID that + remains in the table and database, respectively. + + + + The extra steps performed within every aggressive + VACUUM against every table have the overall + effect of tracking the oldest remaining unfrozen transaction ID + in the entire cluster (every table from every database). + Aggressive VACUUMs will (in the aggregate and + over time) make sure that the oldest unfrozen transaction ID in + the entire system is never too far in the past. + + + + Managing the Transaction ID Space + + Freezing removes local dependencies on + external transaction status information from individual heap + pages. Advancing relfrozenxid + removes global dependencies from whole + tables in turn. + + + The oldest XID in the entire cluster can be thought of as the + beginning of the XID space, while the next unallocated XID can + be thought of as the end of the XID space. This space + represents the range of XIDs that might still require + transaction commit/abort status lookups in pg_xact. + + + + + The maximum XID age that the system can tolerate (i.e., the + maximum distance between the oldest unfrozen + transaction ID in any table according to + pg_class.relfrozenxid, + and the next unallocated transaction ID) is about 2.1 billion + transaction IDs. This maximum XID age invariant + makes it fundamentally impossible to put off aggressive + VACUUMs (and freezing) forever + + + Aggressive VACUUMs cannot be put off + forever, barring the edge-case where the + installation is never expected to consume more than about 2.1 + billion XIDs. In practice this has practical + relevance. + + . The invariant imposes an absolute hard limit on how + long any table can go without an aggressive VACUUM. + + + + If the hard limit is ever reached, then the system will activate + xidStopLimit + mode, which temporarily prevents the allocation of new + permanent transaction IDs. The system will only deactive + xidStopLimit mode when + VACUUM (typically run by autovacuum) succeeds + in advancing the oldest datfrozenxid in the + cluster (via an aggressive VACUUM that runs to + completion against the table that has the oldest + relfrozenxid). + + + + The 2.1 billion XIDs maximum XID age invariant + must be preserved because transaction IDs stored in heap row + headers use a truncated 32-bit representation (rather than the + full 64-bit representation). Since all unfrozen transaction IDs + from heap tuple headers must be from the + same transaction ID epoch (or from a space in the 64-bit + representation that spans two adjoining transaction ID epochs), + there isn't any need to store a separate epoch field in each + tuple header (see for + further details). This scheme has the advantage of requiring + much less on-disk storage space than a design that stores an XID + epoch alongside each XID stored in each heap tuple header. It + has the disadvantage of constraining the system's ability to + allocate new XIDs in the worst case scenario where + xidStopLimit mode is used to preserve the + maximum XID age invariant. + + + + There is only one major runtime behavioral difference between + aggressive mode VACUUMs and non-aggressive + VACUUMs: only non-aggressive + VACUUMs will skip pages that don't have any + dead row versions even if those pages still have row versions + with old XID values (pages marked as all-visible in the + visibility map). Aggressive VACUUMs can only + skip pages that are marked as both all-visible and all-frozen. + Consequently, non-aggressive VACUUMs usually + won't freeze every page containing an XID + that has already attained an age of + vacuum_freeze_min_age or more. Failing to + freeze older pages during non-aggressive + VACUUMs may lead to aggressive + VACUUMs that perform a disproportionately + large amount of the work of freezing required by one particular + table. + + + + + When the VACUUM command's + VERBOSE parameter is specified, + VACUUM prints various statistics about the + table. Its output includes information about how + relfrozenxid and + relminmxid advanced, and the number + of newly frozen pages. The same details appear in the server + log when autovacuum logging (controlled by ) reports on a + VACUUM operation executed by autovacuum. + + + + + + In practice, most tables require periodic aggressive vacuuming. + However, some individual non-aggressive + VACUUM operations may be able to advance + relfrozenxid and/or + relminmxid. Non-aggressive + relfrozenxid/relminmxid + advancement is most common in small, frequently modified tables. + + + + + Most individual tables will eventually need an aggressive + VACUUM, which will reliably freeze all pages + with XID (or MultiXact ID) values older than + vacuum_freeze_min_age (or older than + vacuum_multixact_freeze_min_age), including + those from all-visible but not all-frozen pages (and then advance + pg_class.relfrozenxid + to a value that reflects all that). controls when + VACUUM must use its aggressive strategy. If + age(relfrozenxid) exceeds + vacuum_freeze_table_age at the start of + VACUUM, that VACUUM will + use the aggressive strategy; otherwise the standard + non-aggressive strategy is used. Setting + vacuum_freeze_table_age to 0 forces + VACUUM to always use its aggressive strategy. + + + + + Anti-Wraparound Autovacuums + + + To ensure that every table has its + relfrozenxid advanced at somewhat + regular intervals, even in the case of completely static tables, + autovacuum runs against any table that might contain unfrozen + rows with XIDs older than the age specified by the configuration + parameter . These + are anti-wraparound autovacuums. + Anti-wraparound autovacuums can happen even when autovacuum is + nominally disabled in postgresql.conf. + + + + In practice, all anti-wraparound autovacuums will use + VACUUM's aggressive strategy (if they didn't, + then it would defeat the whole purpose of anti-wraparound + autovacuuming). Use of VACUUM's aggressive + strategy is certain, because the effective value of + vacuum_freeze_table_age is silently + clamped to a value no greater than 95% of the + current value of autovacuum_freeze_max_age. + + + + As a rule of thumb, vacuum_freeze_table_age + should be set to a value somewhat below + autovacuum_freeze_max_age, so that there is a + window during which any autovacuum triggered by inserts, updates, + or deletes (or any manually issued VACUUM) + will become an aggressive VACUUM. Such + VACUUMs will reliably advance + relfrozenxid in passing, even though + autovacuum won't have specifically set out to make sure + relfrozenxid advances through + anti-wraparound autovacuuming. Anti-wraparound autovacuums may + never be required at all in tables that regularly require + vacuuming to reclaim + space from dead tuples and/or to set pages all-visible in the + visibility map (especially if + vacuum_freeze_table_age is set to a value + significantly below + autovacuum_freeze_max_age). + + + + Note on terminology + + Aggressive VACUUM is a special form of + VACUUM. An aggressive + VACUUM must advance + relfrozenxid up to an XID value that + is no greater than vacuum_freeze_min_age XIDs + in age as of the start of the + VACUUM operation. + + + Anti-wraparound autovacuum is a special form of Autovacuum. Its + purpose is to make sure that + relfrozenxid is advanced when no + earlier aggressive VACUUM ran and advanced + relfrozenxid in passing (often + because no VACUUM needed to run against the + table at all). + + + There is only one runtime behavioral difference between + anti-wraparound autovacuums and other autovacuums that happen to + end up running an aggressive VACUUM: + Anti-wraparound autovacuums cannot be + autocancelled. This means that autovacuum workers + that perform anti-wraparound autovacuuming do not yield to + conflicting relation-level lock requests (e.g., from + ALTER TABLE). See for a full explanation. + + + + + VACUUM also applies and . These are + independent MultiXact ID based triggers of aggressive + VACUUM (and anti-wraparound autovacuum). They + are applied by following rules analogous to the rules already + described for vacuum_freeze_table_age and + autovacuum_freeze_max_age, respectively + + + Though note that autovacuum (and VACUUM) use a lower + effective + autovacuum_multixact_freeze_max_age + value (determined dynamically) to deal with issues with + truncation of the SLRU storage areas, as + explained in + + . + + + + It doesn't matter if it was vacuum_freeze_table_age or + vacuum_multixact_freeze_table_age that + triggered VACUUM's decision to use its + aggressive strategy. Every aggressive + VACUUM will advance + relfrozenxid and + relminmxid by following the same + generic steps at runtime. + + + + A convenient way to examine information about + relfrozenxid and + relminmxid is to execute queries such as: + + +SELECT c.oid::regclass as table_name, +greatest(age(c.relfrozenxid), + age(t.relfrozenxid)) as xid_age, +mxid_age(c.relminmxid) +FROM pg_class c +LEFT JOIN pg_class t ON c.reltoastrelid = t.oid +WHERE c.relkind IN ('r', 'm'); + +SELECT datname, +age(datfrozenxid) as xid_age, +mxid_age(datminmxid) +FROM pg_database; + + + The age function returns the number of + transactions from relfrozenxid to the + next unallocated transaction ID. The + mxid_age function the number of MultiXact + IDs from relminmxid to the next + unallocated MultiXact ID. + + + + The system should always have significant XID allocation slack + capacity. Ideally, the greatest + age(relfrozenxid)/age(datfrozenxid) + in the system will never be more than a fraction of the 2.1 + billion XID hard limit described in . The default + vacuum_freeze_table_age setting of 200 million + transactions implies that the system should never use + significantly more than about 10% of that hard limit. + + + + There is little advantage in routinely allowing the greatest + age(relfrozenxid) in the system to get + anywhere near to the 2.1 billion XID hard limit. Putting off the + work of freezing can only reduce the absolute amount of + WAL written by VACUUM when + VACUUM thereby completely avoids freezing rows + that are deleted before long anyway. There is little or no + disadvantage from lowering vacuum_freeze_table_age + to make aggressive VACUUMs more frequent, at + least in tables where newly frozen pages almost always remain + all-frozen forever. Note also that anything that leads to + relfrozenxid and + relminmxid advancing less frequently + (such as a higher vacuum_freeze_table_age + setting) will also increase the on-disk space required to store + additional transaction status information, as described in . + + + + + + <literal>xidStopLimit</literal> mode + + If for some reason autovacuum utterly fails to advance any + table's relfrozenxid or + relminmxid for an extended period, and + if XIDs and/or MultiXact IDs continue to be allocated, the system + will begin to emit warning messages like this when the database's + oldest XIDs reach forty million transactions from the 2.1 billion + XID hard limit described in : WARNING: database "mydb" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. - (A manual VACUUM should fix the problem, as suggested by the - hint; but note that the VACUUM must be performed by a - superuser, else it will fail to process system catalogs and thus not - be able to advance the database's datfrozenxid.) - If these warnings are - ignored, the system will shut down and refuse to start any new - transactions once there are fewer than three million transactions left - until wraparound: + (A manual VACUUM should fix the problem, as suggested by the + hint; but note that the VACUUM must be performed by a + superuser, else it will fail to process system catalogs and thus not + be able to advance the database's datfrozenxid.) + If these warnings are ignored, the system will eventually refuse + to start any new transactions. This happens at the point that + there are fewer than three million transactions left: ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode. - The three-million-transaction safety margin exists to let the - administrator recover without data loss, by manually executing the - required VACUUM commands. However, since the system will not - execute commands once it has gone into the safety shutdown mode, - the only way to do this is to stop the server and start the server in single-user - mode to execute VACUUM. The shutdown mode is not enforced - in single-user mode. See the reference - page for details about using single-user mode. - - - - Multixacts and Wraparound - - - MultiXactId - - - - wraparound - of multixact IDs - - - - Multixact IDs are used to support row locking by - multiple transactions. Since there is only limited space in a tuple - header to store lock information, that information is encoded as - a multiple transaction ID, or multixact ID for short, - whenever there is more than one transaction concurrently locking a - row. Information about which transaction IDs are included in any - particular multixact ID is stored separately in - the pg_multixact subdirectory, and only the multixact ID - appears in the xmax field in the tuple header. - Like transaction IDs, multixact IDs are implemented as a - 32-bit counter and corresponding storage, all of which requires - careful aging management, storage cleanup, and wraparound handling. - There is a separate storage area which holds the list of members in - each multixact, which also uses a 32-bit counter and which must also - be managed. + The three-million-transaction safety margin exists to let the + administrator recover without data loss, by manually executing the + required VACUUM commands. However, since the system will not + execute commands once it has gone into the safety shutdown mode, + the only way to do this is to stop the server and start the server in single-user + mode to execute VACUUM. The shutdown mode is not enforced + in single-user mode. See the reference + page for details about using single-user mode. - Whenever VACUUM scans any part of a table, it will replace - any multixact ID it encounters which is older than - - by a different value, which can be the zero value, a single - transaction ID, or a newer multixact ID. For each table, - pg_class.relminmxid stores the oldest - possible multixact ID still appearing in any tuple of that table. - If this value is older than - , an aggressive - vacuum is forced. As discussed in the previous section, an aggressive - vacuum means that only those pages which are known to be all-frozen will - be skipped. mxid_age() can be used on - pg_class.relminmxid to find its age. - - - - Aggressive VACUUMs, regardless of what causes - them, are guaranteed to be able to advance - the table's relminmxid. - Eventually, as all tables in all databases are scanned and their - oldest multixact values are advanced, on-disk storage for older - multixacts can be removed. - - - - As a safety device, an aggressive vacuum scan will - occur for any table whose multixact-age is greater than . Also, if the - storage occupied by multixacts members exceeds 2GB, aggressive vacuum - scans will occur more often for all tables, starting with those that - have the oldest multixact-age. Both of these kinds of aggressive - scans will occur even if autovacuum is nominally disabled. + In emergencies, VACUUM will take extraordinary + measures to avoid xidStopLimit mode. A + failsafe mechanism is triggered when the table's + relfrozenxid attains an age of XIDs, or when the table's + relminmxid attains an age of MultiXact IDs. + The failsafe prioritizes advancing + relfrozenxid and/or + relminmxid as quickly as possible. + Once the failsafe triggers, VACUUM bypasses + all remaining non-essential maintenance tasks, and stops applying + any cost-based delay that was in effect. Any Buffer Access + Strategy in use will also be disabled. @@ -787,12 +1156,23 @@ HINT: Stop the postmaster and vacuum that database in single-user mode. Updating the Visibility Map - Vacuum maintains a visibility - map for each table to keep track of which pages contain - only tuples that are known to be visible to all active - transactions (and all future transactions, until the page is again - modified). This has two purposes. First, vacuum itself can skip - such pages on the next run, since there is nothing to clean up. + VACUUM maintains a visibility map for each table to keep + track of which pages contain only tuples that are known to be + visible to all active transactions (and all future transactions, + at least until the page is modified). A separate bit tracks + whether all of the tuples are frozen. + + + + The visibility map serves two purposes. + + + + First, VACUUM itself can skip such pages on the + next run, since there is nothing to clean up. Even aggressive VACUUMs + can skip pages that are both all-visible and all-frozen. @@ -812,6 +1192,65 @@ HINT: Stop the postmaster and vacuum that database in single-user mode. + + Truncating transaction status information + + + Anything that influences when and how + relfrozenxid and + relminmxid advance will also directly + affect the high watermark storage overhead needed to store + historical transaction status information. For example, + increasing autovacuum_freeze_max_age (and + vacuum_freeze_table_age along with it) will + make the pg_xact and + pg_commit_ts subdirectories of the database + cluster take more space, because they store the commit/abort + status and (if track_commit_timestamp is enabled) + timestamp of all transactions back to the + datfrozenxid horizon (the earliest + datfrozenxid among all databases in the + cluster). + + + + The commit status uses two bits per transaction. The default + autovacuum_freeze_max_age setting of 200 + million transactions translates to about 50MB of + pg_xact storage. When + track_commit_timestamp is enabled, about 2GB of + pg_commit_ts storage will also be required. + + + + MultiXact ID status information storage uses two separate + underlying SLRU storage areas: + pg_multixact/members, and + pg_multixact/offsets. There is no simple + formula to determine the storage overhead per MultiXact ID, since + in general MultiXact IDs have a variable number of member XIDs. + Note, however, that if pg_multixact/members + exceeds 2GB, then the effective value of + autovacuum_multixact_freeze_max_age used by + VACUUM will be lower, resulting in more + frequent aggressive mode VACUUMs. + + + + Truncation of transaction status information is only possible at + the end of VACUUMs that advance the earliest + relfrozenxid (in the case of + pg_xact and + pg_commit_ts), or the earliest + relminmxid (in the case of + pg_multixact/members and + pg_multixact/offsets) among all tables in the + entire database (assuming that its the database with the earliest + datfrozenxid and + datminmxid in the entire cluster). + + + Updating Planner Statistics @@ -927,7 +1366,7 @@ HINT: Stop the postmaster and vacuum that database in single-user mode. - + diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 10ef699fa..8aa332fcf 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1515,7 +1515,7 @@ WITH ( MODULUS numeric_literal, REM and/or ANALYZE operations on this table following the rules discussed in . If false, this table will not be autovacuumed, except to prevent - transaction ID wraparound. See for + transaction ID wraparound. See for more about wraparound prevention. Note that the autovacuum daemon does not run at all (except to prevent transaction ID wraparound) if the diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml index f4f6118ac..ede50d6f7 100644 --- a/doc/src/sgml/ref/prepare_transaction.sgml +++ b/doc/src/sgml/ref/prepare_transaction.sgml @@ -128,7 +128,7 @@ PREPARE TRANSACTION transaction_id This will interfere with the ability of VACUUM to reclaim storage, and in extreme cases could cause the database to shut down to prevent transaction ID wraparound (see ). Keep in mind also that the transaction + linkend="freezing-xid-space"/>). Keep in mind also that the transaction continues to hold whatever locks it held. The intended usage of the feature is that a prepared transaction will normally be committed or rolled back as soon as an external transaction manager has verified that diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index 57bc4c23e..95efe7d36 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -123,7 +123,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ aggressive strategy. Specifying FREEZE is equivalent to performing VACUUM with the and @@ -219,7 +221,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ ). However, the + (see ). However, the wraparound failsafe mechanism controlled by will generally trigger automatically to avoid transaction ID wraparound failure, and diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml index da2393783..b61d523c2 100644 --- a/doc/src/sgml/ref/vacuumdb.sgml +++ b/doc/src/sgml/ref/vacuumdb.sgml @@ -233,7 +233,7 @@ PostgreSQL documentation ID age of at least mxid_age. This setting is useful for prioritizing tables to process to prevent multixact ID wraparound (see - ). + ). For the purposes of this option, the multixact ID age of a relation is @@ -254,7 +254,7 @@ PostgreSQL documentation transaction ID age of at least xid_age. This setting is useful for prioritizing tables to process to prevent transaction - ID wraparound (see ). + ID wraparound (see ). For the purposes of this option, the transaction ID age of a relation diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml index 0762442e1..e372a7875 100644 --- a/doc/src/sgml/xact.sgml +++ b/doc/src/sgml/xact.sgml @@ -185,7 +185,7 @@ rows and can be inspected using the extension. Row-level read locks might also require the assignment of multixact IDs (mxid; see ). + linkend="freezing-xid-space"/>). -- 2.40.1