Home > mailing lists

Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From	Robert Haas
Subject	Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date	May 9, 2015 12:43:56
Msg-id	8486B09E-773B-4838-A7E8-8E48433245E1@gmail.com Whole thread Raw
In response to	Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
List	pgsql-bugs

Tree view

On May 9, 2015, at 8:00 AM, Thomas Munro <thomas.munro@enterprisedb.com> wro=
te:
>> On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote=
:
>>> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com=
> wrote:
>>> Thomas Munro wrote:
>>>> I think the fix is something like "if nextMXact =3D=3D oldestMultiXactI=
d,
>>>> then there are no active multixacts, so the offsetStopLimit should be
>>>> set to nextOffset - (a segment's worth)".
>>>=20
>>> Makes sense.
>>=20
>> Here's a patch that attempts to implement this.
>=20
> Thanks.  I think I have managed to reproduce something like the data
> loss race that we were speculating about.
>=20
> 0.  initdb, autovacuum =3D off, set up explode_mxact_members.c as
> described elsewhere in the thread.
> 1.  Fill up the members SLRU completely (ie reach state where you can
> no longer create a new multixact of any size).  pg_multixact/members
> contains 82040 files and the last one is named 14077.
> 2.  Issue CHECKPOINT, but use a debugger to stop inside
> TruncateMultiXact after it has read
> MultiXactState->lastCheckpointedOldest and released the lock, but
> before it calls SlruScanDirectory to delete files...
> 3.  Run VACUUM FREEZE in all databases (including template0).  datminmxid m=
oves.
> 4.  Create lots of new multixacts.  pg_multixact/members now contains
> 82041 files and the last one is named 14078 (ie one extra segment,
> with the highest possible segment number, which couldn't be created
> before vacuuming because of the one segment gap enforced by
> DetermineSafeOldestOffset).  Segments 0000-0016 have new modified
> times.
> 5.  ... allow the checkpoint started in step 2 to continue.  It
> deletes segments, keeping only 0000-0016.  The segment 14078 which
> contained active member data has been incorrectly deleted.

OK. So the next question is: if you then apply the other patch, does that pr=
event step 4 and thereby avoid catastrophe?

...Robert=

pgsql-bugs by date:

From: Thomas Munro
Date: 09 May 2015, 12:01:01
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Bruce Momjian
Date: 09 May 2015, 18:13:14
Subject: Re: psqlodbc: HEAD fails to build with recent clang

Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

Previous

Next