Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CAEepm=3+=SP3eEv3s3EmVg7HeppaKP-ixMenVgMfHQCCEY=f-Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On Tue, Apr 28, 2015 at 4:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 27, 2015 at 10:59 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> I think we can easily determine the rate of multixact member space
>> consumption and compare to the rate of multixact ID consumption;
>> considering the historical multixact size (number of members per
>> multixact) it would be possible to change the freeze ages by the same
>> fraction, so that autovac effectively behaves as if the members
>> consumption rate is what is driving the freezing instead of ID
>> consumption rate.  That way, we don't have to jump suddenly from
>> "normal" to "emergency" behavior as some fixed threshold.
>
> Right.  I think that not jumping from normal mode to emergency mode is
> quite important, and was trying to describe a system that would
> gradually ramp up the pressure rather than a system that would do
> nothing for a while and then suddenly go ballistic.
>
> With regard to what you've outlined here, we need to make sure that if
> the multixact rate varies widely, we still clean up before we hit
> autovac wraparond.  That's why I think it should be driven off of the
> fraction of the available address space which is currently consumed,
> not some kind of short term measure of mxact size or generation rate.
> I'm not sure exactly what you have in mind here.

Here is a work-in-progress-patch that:

(1) Fixes the boundary bug already discussed (as before, except that I
fixed up the comments in MultiXactOffsetWouldWrap() to survive
pgindent based on feedback from Kevin).

(2) Makes autovacuum adjust the effective value of
autovacuum_multixact_freeze_max_age down to zero as the fraction of
used addressable member offset range approaches 75%, and also makes
vacuum_multixact_freeze_min_age use the same value if it is lower than
the configured min age.

When I run this with my explode_mxact_members.c program, autovacuum
moves datminmxid forwards, avoiding the error.  The algorithm in
autovacuum_multixact_freeze_max_age_adjusted is possibly too simple,
but it should at least spread out wraparound autovacuums rather than
suddenly going ballistic.

Do you think it's OK that MultiXactAdvanceNextMXact still uses
MultiXactOffsetPrecedes to compare two member offsets?  Can the value
from an XLog record being replayed be more than 2.2 billion away from
MultiXactState->nextOffset?

(The attachment memberswrap-2.patch is a diff against master, and
memberswrap-2-incremental.patch is a diff against master with Alvaro's
patch applied).

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: [SPAM] Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot
Next
From: Michael Paquier
Date:
Subject: Re: pg_get_constraintdef failing with cache lookup error