Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general

From Thomas Munro
Subject Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id CAEepm=1ffsDJuUBBj6__8A7pcQNnNBm1+R1844=zZDWrECfKog@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Fri, May 29, 2015 at 11:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> A. Most obviously, we should fix pg_upgrade so that it installs
> chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so
> that we stop creating new instances of this problem.  That won't get
> us out of the hole we've dug for ourselves, but we can at least try to
> stop digging.  (This is assuming I'm right that chkpnt_nxtmulti is the
> wrong thing - anyone want to double-check me on that one?)

Yes, it seems like this could lead to truncation of multixacts still
referenced by tuples, leading to errors when updating, locking,
vacuuming.  Why don't we have reports of that?

> B. We need to change find_multixact_start() to fail softly.  This is
> important because it's legitimate for it to fail in recovery, as
> discussed upthread, and also because we probably want to eliminate the
> fail-to-start hazard introduced in 9.4.2 and 9.3.7.
> find_multixact_start() is used in three places, and they each require
> separate handling:

Here is an experimental WIP patch that changes StartupMultiXact and
SetMultiXactIdLimit to find the oldest multixact that exists on disk
(by scanning the directory), and uses that if it is more recent than
the oldestMultiXactId from shmem, when calling
DetermineSafeOldestOffset.  I'm not all that happy with it, see below,
but let me know what you think.

Using unpatched master, I reproduced the startup error with a bit of a
short cut:

1.  initdb, generate enough multixacts to get more than one offsets file
2.  ALTER DATABASE template0 ALLOW_CONNECTION = true;, vacuumdb
--freeze --all, CHECKPOINT
3.  verify that pg_control now holds a large oldestMultiXactId, and
note NextMultiXactId
4.  shutdown, pg_resetxlog -m (NextMultiXactId from 3),1 pg_data
5.  start up: fails

Apply this patch, and it starts up successfully.

What are the repro steps for the replay problem?  Is a basebackup of a
large database undergoing truncation and some good timing needed?

> - In SetMultiXactIdLimit, find_multixact_start() is used to set
> MultiXactState->oldestOffset, which is used to determine how
> aggressively to vacuum.  If find_multixact_start() fails, we don't
> know how aggressively we need to vacuum to prevent members wraparound;
> it's probably best to decide to vacuum as aggressively as possible.
> Of course, if we're in recovery, we won't vacuum either way; the fact
> that it fails softly is good enough.

Isn't it enough to use the start offset for the most recent of the
oldest multixact ID and the oldest multixact found by scanning
pg_multixact/offsets?  In this patch, it does that, but I'm not happy
with the time the work is done, it just doesn't seem right for
SetMultiXactIdLimit to be scanning that directory.  The result of that
operation should only change when files have been truncated anyway,
and the truncate code was already doing a filesystem scan.  Maybe the
truncate code should store the earliest multixact ID found on disk in
shared memory, so that SetMultiXactIdLimit can use it for free.  I
tried to get that working but couldn't figure out where it should be
initialised -- StartupMultiXact is too late (StartupXLOG calls
SetMultiXactIdLimit before that), but BootstrapMultiXact and
MultiXactShmemInit didn't seem like the right places either.

> - In DetermineSafeOldestOffset, find_multixact_start() is used to set
> MultiXactState->offsetStopLimit.  If it fails here, we don't know when
> to refuse multixact creation to prevent wraparound.  Again, in
> recovery, that's fine.  If it happens in normal running, it's not
> clear what to do.  Refusing multixact creation is an awfully blunt
> instrument.  Maybe we can scan pg_multixact/offsets to determine a
> workable stop limit: the first file greater than the current file that
> exists, minus two segments, is a good stop point.  Perhaps we ought to
> use this mechanism here categorically, not just when
> find_multixact_start() fails.  It might be more robust than what we
> have now.

Done in this patch -- the truncate code calls
DetermineSafeOldestOffset with the earliest SLRU found by scanning if
that's more recent than the shmem value, and then
DetermineSafeOldestOffset applies the step-back-one-whole-segment
logic to that as before.

> - In TruncateMultiXact, find_multixact_start() is used to set the
> truncation point for the members SLRU.  If it fails here, I'm guessing
> the right solution is not to truncate anything - instead, rely on
> intense vacuuming to eventually advance oldestMXact to a value whose
> member data still exists; truncate then.

TruncateMultiXact already contained logic to do nothing at all if
oldestMXact is older than the earliest it can find on disk.  I moved
that code into find_earliest_multixact_on_disk() to be able to use it
elsewhere too, in this patch.

> C. I think we should also change TruncateMultiXact() to truncate
> offsets first, and then members.  As things stand, if we truncate
> members first, we increase the risk of seeing an offset that will fail
> when passed to find_multixact_start(), because TruncateMultiXact()
> might get interrupted before it finishes.  That seem like an
> unnecessary risk.

I don't see why the order matters.  find_multixact_start() doesn't
read the members, only the offsets SLRU (ie the index into members,
not the contents of members).  As I understand it, the only time we
need to access the members themselves is when we encounter multixacts
in tuple headers (updating, locking or vacuuming).  If you have
truncated multixacts referenced in your tuples then you have a
different form of corruption than the
pg_upgrade-tramples-on-oldestMultiXactId case we're trying to handle
gracefully here.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-general by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Next
From: PT
Date:
Subject: Re: Fwd: Raster performance