Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general

From Noah Misch
Subject Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id 20150601044611.GA23587@tornado.leadboat.com
Whole thread Raw
In response to Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-general
Incomplete review, done in a relative rush:

On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
> OK, here's a patch.  Actually two patches, differing only in
> whitespace, for 9.3 and for master (ha!).  I now think that the root
> of the problem here is that DetermineSafeOldestOffset() and
> SetMultiXactIdLimit() were largely ignorant of the possibility that
> they might be called at points in time when the cluster was
> inconsistent.

A cause perhaps closer to the root is commit f741300 moving truncation from
VACUUM to checkpoints.  CLOG has given us deep experience with VACUUM-time
truncation.  Commit f6a6c46d and this patch are about bringing CHECKPOINT-time
truncation up to the same level.

Speaking of commit f6a6c46d, it seems logical that updating allocation stop
limits should happen proximate to truncation.  That's currently the case for
CLOG (vac_truncate_clog() does both) and pg_multixact/members (checkpoint's
TruncateMultiXact() call does both).  However, pg_multixact/offsets is
truncated from TruncateMultiXact(), but vac_truncate_clog() updates its limit.
I did not distill an errant test case, but this is fishy.

> SetMultiXactIdLimit() bracketed certain parts of its
> logic with if (!InRecovery), but those guards were ineffective because
> it gets called before InRecovery is set in the first place.

SetTransactionIdLimit() checks InRecovery for the same things, and it is
called at nearly the same moments as SetMultiXactIdLimit().  Do you have a
sense of whether it is subject to similar problems as a result?

> 1. Moves the call to DetermineSafeOldestOffset() that appears in
> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this
> until we're consistent.  Also, instead of passing
> MultiXactState->oldestMultiXactId, pass the newer of that value and
> the earliest offset that exists on disk.  That way, it won't try to
> read data that's not there.

Perhaps emit a LOG message when we do that, since it's our last opportunity to
point to the potential data loss?

> +     * PostgreSQL 9.3.0 through 9.3.6 and PostgreSQL 9.4.0 through 9.4.1
> +     * had bugs that could allow users who reached those release through

s/release/releases/

> @@ -2859,6 +2947,14 @@ TruncateMultiXact(void)
>      SimpleLruTruncate(MultiXactOffsetCtl,
>                        MultiXactIdToOffsetPage(oldestMXact));
>
> +    /* Update oldest-on-disk value in shared memory. */
> +    earliest = range.rangeStart * MULTIXACT_OFFSETS_PER_PAGE;
> +    if (earliest < FirstMultiXactId)
> +        earliest = FirstMultiXactId;
> +    LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
> +    Assert(MultiXactState->oldestMultiXactOnDiskValid);
> +    MultiXactState->oldestMultiXactOnDiskValid = earliest;

That last line needs s/Valid//, I presume.  Is it okay that
oldestMultiXactOnDisk becomes too-old during TruncateMultiXact(), despite its
Valid indicator remaining true?

> +static MultiXactOffset
> +GetOldestMultiXactOnDisk(void)
> +{

> +    SlruScanDirectory(MultiXactOffsetCtl, SlruScanDirCbFindEarliest, &trunc);
> +    earliest = trunc.earliestExistingPage * MULTIXACT_OFFSETS_PER_PAGE;
> +    if (earliest < FirstMultiXactId)
> +        earliest = FirstMultiXactId;

SlruScanDirCbFindEarliest() is only meaningful if the files on disk do not
represent a wrapped state.  When the files do represent a wrapped state,
MultiXactIdPrecedes() is not transitive, and the SlruScanDirCbFindEarliest()
result is sensitive to readdir() order.  This code exists specifically to help
us make do in the face of wrong catalog and pg_control entries.  We may have
wrapped as a result of those of same catalog/pg_control entries, so I think
this function ought to account for wrap.  I haven't given enough thought to
what exactly it should do.

Thanks,
nm


pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: JSONB matching element count
Next
From: Noah Misch
Date:
Subject: Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1