Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CA+TgmoZej9dnnMfXdKOW9yYaShCN4f+3PRsXMY=zwYamOKFn+g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
On Mon, Apr 27, 2015 at 10:59 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Let me push a patch to fix the corruption, and then we can think of ways
> to teach autovacuum about the problem.

Sounds good to me.  Are you going to do that today?

> I'm not optimistic about that,
> honestly: as all GUC settings, these are individual for each process,
> and there's no way for one process to affect the values that are seen by
> other processes (autovac workers).  The only idea that comes to mind is
> to publish values in shared memory, and autovac workers would read them
> from there instead of using normal GUC values.

I don't think we could store values for the parameters directly in
shared memory, because I think that at least some of those GUCs are
per-session changeable.  But we might be able to store weighting
factors in shared memory that get applied to whatever the values in
the current session are.  Or else maybe each backend can just
recompute the information for itself when it needs it.

>> What I think we should do is notice when members utilization exceeds
>> offset utilization and progressively ramp back the effective value of
>> autovacuum_multixact_freeze_max_age (and maybe also
>> vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age)
>> so that autovacuum (and maybe also manual vacuums) get progressively
>> more aggressive about trying to advance relminmxid.  Suppose we decide
>> that when the "members" space is 75% used, we've got a big problem and
>> want to treat autovacuum_multixact_freeze_max_age to effectively be
>> zero.
>
> I think we can easily determine the rate of multixact member space
> consumption and compare to the rate of multixact ID consumption;
> considering the historical multixact size (number of members per
> multixact) it would be possible to change the freeze ages by the same
> fraction, so that autovac effectively behaves as if the members
> consumption rate is what is driving the freezing instead of ID
> consumption rate.  That way, we don't have to jump suddenly from
> "normal" to "emergency" behavior as some fixed threshold.

Right.  I think that not jumping from normal mode to emergency mode is
quite important, and was trying to describe a system that would
gradually ramp up the pressure rather than a system that would do
nothing for a while and then suddenly go ballistic.

With regard to what you've outlined here, we need to make sure that if
the multixact rate varies widely, we still clean up before we hit
autovac wraparond.  That's why I think it should be driven off of the
fraction of the available address space which is currently consumed,
not some kind of short term measure of mxact size or generation rate.
I'm not sure exactly what you have in mind here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #13168: DROP DATABASE does not clean up all references
Next
From: Patrice Drolet
Date:
Subject: Re: [SPAM] Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot