Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id 20150427145909.GU4369@alvh.no-ip.org
Whole thread Raw
In response to Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Robert Haas <robertmhaas@gmail.com>)
Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (David Gould <daveg@sonic.net>)
List pgsql-bugs
Robert Haas wrote:
> On Thu, Apr 23, 2015 at 9:59 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Thomas Munro wrote:
> >> That's why I proposed not using xid-like logic, and instead using a
> >> type of three-way comparison that allows you to see when nextOffset
> >> would 'cross' oldestOffsetStopLimit, instead of the two-way comparison
> >> that considers half the number-space to be in the past and half in the
> >> future, in my earlier message.
> >
> > Yeah, that bit made sense to me.
>
> In addition to preventing the corruption, I think we also need a
> back-patchable fix for AV to try to keep this situation from happening
> in the first place.

Let me push a patch to fix the corruption, and then we can think of ways
to teach autovacuum about the problem.  I'm not optimistic about that,
honestly: as all GUC settings, these are individual for each process,
and there's no way for one process to affect the values that are seen by
other processes (autovac workers).  The only idea that comes to mind is
to publish values in shared memory, and autovac workers would read them
from there instead of using normal GUC values.

> What I think we should do is notice when members utilization exceeds
> offset utilization and progressively ramp back the effective value of
> autovacuum_multixact_freeze_max_age (and maybe also
> vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age)
> so that autovacuum (and maybe also manual vacuums) get progressively
> more aggressive about trying to advance relminmxid.  Suppose we decide
> that when the "members" space is 75% used, we've got a big problem and
> want to treat autovacuum_multixact_freeze_max_age to effectively be
> zero.

I think we can easily determine the rate of multixact member space
consumption and compare to the rate of multixact ID consumption;
considering the historical multixact size (number of members per
multixact) it would be possible to change the freeze ages by the same
fraction, so that autovac effectively behaves as if the members
consumption rate is what is driving the freezing instead of ID
consumption rate.  That way, we don't have to jump suddenly from
"normal" to "emergency" behavior as some fixed threshold.

> This may not be the right proposal in detail, but I think we should do
> something.

No disagreement on that.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: Andres Freund
Date:
Subject: Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot