pgsql: Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmx - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmx
Date
Msg-id E1X9FjT-0008UF-F8@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values.

In commit a61daa14d56867e90dc011bbba52ef771cea6770, we fixed pg_upgrade so
that it would install sane relminmxid and datminmxid values, but that does
not cure the problem for installations that were already pg_upgraded to
9.3; they'll initially have "1" in those fields.  This is not a big problem
so long as 1 is "in the past" compared to the current nextMultiXact
counter.  But if an installation were more than halfway to the MXID wrap
point at the time of upgrade, 1 would appear to be "in the future" and
that would effectively disable tracking of oldest MXIDs in those
tables/databases, until such time as the counter wrapped around.

While in itself this isn't worse than the situation pre-9.3, where we did
not manage MXID wraparound risk at all, the consequences of premature
truncation of pg_multixact are worse now; so we ought to make some effort
to cope with this.  We discussed advising users to fix the tracking values
manually, but that seems both very tedious and very error-prone.

Instead, this patch adopts two amelioration rules.  First, a relminmxid
value that is "in the future" is allowed to be overwritten with a
full-table VACUUM's actual freeze cutoff, ignoring the normal rule that
relminmxid should never go backwards.  (This essentially assumes that we
have enough defenses in place that wraparound can never occur anymore,
and thus that a value "in the future" must be corrupt.)  Second, if we see
any "in the future" values then we refrain from truncating pg_clog and
pg_multixact.  This prevents loss of clog data until we have cleaned up
all the broken tracking data.  In the worst case that could result in
considerable clog bloat, but in practice we expect that relfrozenxid-driven
freezing will happen soon enough to fix the problem before clog bloat
becomes intolerable.  (Users could do manual VACUUM FREEZEs if not.)

Note that this mechanism cannot save us if there are already-wrapped or
already-truncated-away MXIDs in the table; it's only capable of dealing
with corrupt tracking values.  But that's the situation we have with the
pg_upgrade bug.

For consistency, apply the same rules to relfrozenxid/datfrozenxid.  There
are not known mechanisms for these to get messed up, but if they were, the
same tactics seem appropriate for fixing them.

Branch
------
REL9_4_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/d122387d7d9149fdb9589ebdaf52c763464e4cd5

Modified Files
--------------
src/backend/commands/vacuum.c |  125 ++++++++++++++++++++++++++++++++---------
1 file changed, 99 insertions(+), 26 deletions(-)


pgsql-committers by date:

Previous
From: Magnus Hagander
Date:
Subject: pgsql: Properly use DEFAULT_EVENT_SOURCE in pgevent.c
Next
From: Tom Lane
Date:
Subject: pgsql: Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmx