Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CAEepm=3C32VPJLOo45y0c3-3KWXNV2xM4jaPTSVjCRD2VG0Qgg@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Thomas Munro wrote:
>>> I think the fix is something like "if nextMXact == oldestMultiXactId,
>>> then there are no active multixacts, so the offsetStopLimit should be
>>> set to nextOffset - (a segment's worth)".
>>
>> Makes sense.
>
> Here's a patch that attempts to implement this.

Thanks.  I think I have managed to reproduce something like the data
loss race that we were speculating about.

0.  initdb, autovacuum = off, set up explode_mxact_members.c as
described elsewhere in the thread.
1.  Fill up the members SLRU completely (ie reach state where you can
no longer create a new multixact of any size).  pg_multixact/members
contains 82040 files and the last one is named 14077.
2.  Issue CHECKPOINT, but use a debugger to stop inside
TruncateMultiXact after it has read
MultiXactState->lastCheckpointedOldest and released the lock, but
before it calls SlruScanDirectory to delete files...
3.  Run VACUUM FREEZE in all databases (including template0).  datminmxid moves.
4.  Create lots of new multixacts.  pg_multixact/members now contains
82041 files and the last one is named 14078 (ie one extra segment,
with the highest possible segment number, which couldn't be created
before vacuuming because of the one segment gap enforced by
DetermineSafeOldestOffset).  Segments 0000-0016 have new modified
times.
5.  ... allow the checkpoint started in step 2 to continue.  It
deletes segments, keeping only 0000-0016.  The segment 14078 which
contained active member data has been incorrectly deleted.

--
Thomas Munro
http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: Robert Haas
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)