Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CAEepm=1XGJVijxqG2EE=3Tb2bbrQRTvnXA6vZN1FkOZNtH=Lqw@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
On Fri, May 8, 2015 at 6:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 1. The members SLRU is full all the way up to offsetStopLimit.
> 2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which
> sets lastCheckpointedOldest.
> 3. Vacuum runs, calling SetMultiXactIdLimit(), calling
> DetermineSafeOldestOffset(), advancing
> MultiXactState->offsetStopLimit.
> 4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible
> for someone to consume an MXID greater than offsetStopLimit, making
> MultiXactState->nextOffset > lastCheckpointedOffset
> 5. The checkpoint from step 1, continuing on its merry way, now calls
> TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away
> nearly every file in the SLRU.

I am still working on reproducing this race scenario various different
ways including the way you described, but at step 4 I kept getting
stuck, unable to create new multixacts despite having vacuum-frozen
all databases (including template0) and advanced the cluster minimum
mxid.

I think I see why, and I think it's a bug:  if you vacuum freeze all
your databases, MultiXactState->oldestMultiXactId finishes up equal to
MultiXactState->nextMXact.  But that's not actually a multixact that
exists yet, so when when DetermineSafeOldestOffset calls
find_multixact_start, it reads a garbage offset (all zeros in practice
since pages start out zeroed) and produces a garbage value for
offsetStopLimit which might incorrectly stop you from creating any
more multixacts even though member space is entirely empty (but it
depends on where your nextOffset happens to be at the time).  I think
the fix is something like "if nextMXact == oldestMultiXactId, then
there are no active multixacts, so the offsetStopLimit should be set
to nextOffset - (a segment's worth)".

--
Thomas Munro
http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Alon
Date:
Subject: Re: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Next
From: Alvaro Herrera
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)