Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Robert Haas
Subject Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
On Thu, May 7, 2015 at 7:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Now I
>> understand the suggestion that the checkpoint code could be in charge
>> of advancing the oldest multixact + offset.
>
> Yeah. I think we need to pursue that angle unless somebody has a better idea.

So here's a patch that does that.  It turns out to be pretty simple: I
just moved the DetermineSafeOldestOffset() calls around.  As far as I
can see, and that may not be far enough at this hour of the morning,
we just need two: one in StartupMultiXact, so that we initialize the
values correctly after reading the control file; and then another at
the very end of TruncateMultiXact, so we update it after each
checkpoint or restartpoint.  This leaves the residual problem that
autovacuum doesn't directly advance the stop point - the following
checkpoint does.  We could handle that by requesting a checkpoint if
oldestMultiXactId is ahead of lastCheckpointedOldest by enough that a
checkpoint would free up some space, although one might hope that
people won't be living on the edge to quite that degree.

As things are, I believe this sequence is possible:

1. The members SLRU is full all the way up to offsetStopLimit.
2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which
sets lastCheckpointedOldest.
3. Vacuum runs, calling SetMultiXactIdLimit(), calling
DetermineSafeOldestOffset(), advancing
MultiXactState->offsetStopLimit.
4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible
for someone to consume an MXID greater than offsetStopLimit, making
MultiXactState->nextOffset > lastCheckpointedOffset
5. The checkpoint from step 1, continuing on its merry way, now calls
TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away
nearly every file in the SLRU.

I haven't confirmed this yet, so I might still be all wet, especially
since it is late here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: Alex Dunn
Date:
Subject: Re: psqlodbc: HEAD fails to build with recent clang