Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CAA4eK1+4qgv8JBMb0qR1tYCpABjWR9Qk-Lta2rimY6GT2HpvFw@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
List pgsql-bugs
On Thu, Apr 30, 2015 at 10:47 AM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Apr 29, 2015 at 11:41 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
>
> > 3. currently there is some minimum limit of
autovacuum_multixact_freeze_age
> > (10000000)
> > which might not be honored by this calculation, so not sure if that can
> > impact the
> > system performance in some cases where it is currently working sane.
>
> The reason why we need to be able to set the effective freeze age
> below that minimum in cases of high member data consumption rates is
> that you could hit the new member space wraparound prevention error
> before you consume anywhere near that many multixact IDs.  That
> minimum may well be entirely reasonable if the only thing you're
> worried about is multixact ID wraparound prevention.
>
> For example, my test program eats an average of 250 members per
> multixact ID when run with 500 sessions (each loop creates 500
> multixact IDs having 1, 2, 3, ..., 500 members).  At that rate, you'll
> run out of addressable member space after 2^32 / 250 = 17,179,869
> multixact IDs.  To prevent an error condition using only the existing
> multixact ID wraparound prevention machinery, we need to have an
> effective max table age (so that autovacuum wakes up and scans all
> tables) and min freeze age (so that it actually freezes the tuples)
> below that number.  So we have to ignore the GUC minimum in this
> situation.
>

I understand that point, but I mentioned so that if there is some specific
reason for keeping the current minimum value, then we should evaluate
that we have not broken the same by not honouring the minimum value of
GUC.  As far as I can see from code, there seems to be one place
(refer below code) where that value is used to calculate Warning limit for
multixacts and the current patch doesn't seem to have any impact on the
same.

SetMultiXactIdLimit()
{
..
multiWarnLimit = multiStopLimit - 10000000;
}


> ...
>
> Observations:
>
> 1.  Sometimes the values don't change from minute to minute,
> presumably because there hasn't been a checkpoint to update
> pg_controldata on disk, but hopefully we can still see what's going on
> here despite the slight lag in the data.
>

Yeah and I think this means that there will no advancement for oldest
multixactid and deletion of files if the checkpoints are configured for
a timeout value.  I think there is no harm in specifying this in document
if it is currently not specified.

> 2.  We get to somewhere in the 73-75% SLRU used range before
> wraparound vacuums are triggered.  We probably need to spread things
> out more that that.
>
> 3.  When the autovacuum runs, it advances oldest_mxid by different
> amounts each time; that's because I'm using the adjusted freeze max
> age (the max age of a table before it gets a wraparound vacuum) as our
> freeze min age (the max age for individual tuples before they're
> frozen) here:
>
> @@ -1931,7 +1964,9 @@ do_autovacuum(void)
>   {
>   default_freeze_min_age = vacuum_freeze_min_age;
>   default_freeze_table_age = vacuum_freeze_table_age;
> - default_multixact_freeze_min_age = vacuum_multixact_freeze_min_age;
> + default_multixact_freeze_min_age =
> + Min(vacuum_multixact_freeze_min_age,
> + autovacuum_multixact_freeze_max_age_adjusted());
>   default_multixact_freeze_table_age = vacuum_multixact_freeze_table_age;
>   }
>
> Without that change, autovacuum would trigger repeatedly as we got
> near 75% SLRU usage but not freeze anything, because
> default_multixact_freeze_min_age was higher than the age of any tuples
> (which had only made it to an age of around ~12 million; actually it's
> not exactly the tuple age per se... I don't fully understand the
> treatment of locker and updater multixact IDs in the vacuum code,
> HeapTupleSatisfiesVacuum and heap_freeze_tuple etc yet so I'm not sure
> exactly how that value translates into vacuum work, but I can see
> experimentally that a low multixact freeze min age is needed to get
> relminxmid moved forward).
>
> It's good that freeze table age ramps down so that the autovacuum
> launcher trigger point jumps around a bit and we spread the autovacuum
> launches over time, but it's not great that we finish up truncating
> different amounts of multixacts and associated SLRU each time.  We
> could instead use a freeze min age of 0 to force freezing of *all*
> tuples if this is a member-space-wraparound-prevention vacuum (that
> is, if autovacuum_multixact_freeze_max_age !=
> autovacuum_multixact_freeze_max_age_adjusted()).

We already set vacuum_multixact_freeze_min_age to half of
autovacuum_multixact_freeze_max_age so that autovacuums to
prevent MultiXact wraparound won't occur too frequently as per below
code:

vacuum_set_xid_limits()
{
..
mxid_freezemin = Min(mxid_freezemin,

autovacuum_multixact_freeze_max_age / 2);
Assert(mxid_freezemin >= 0);
..
}

Now if we set it to zero, then I think it might lead to excessive
freezing and inturn more I/O without the actual need (more space
for multixact members)

>
> There is less to say about the results with an unpatched server: it
> drives in a straight line for a while, and then crashes into a wall
> (ie the new error preventing member wraparound), which I see you have
> also reproduced.  It's used up all of the circular member space, but
> only has around 17 million multixacts so autovacuum can't help you
> (it's not even possible to set autovacuum_multixact_freeze_max_age
> below 100 million), so to get things moving again you need to manually
> VACUUM FREEZE all databases including template databases.
>

In my tests on setting vacuum multixact parameter
(vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age)
values to zero, it has successfuly finished the tests (no warning and I
could
see truncation of files in members directory) , so I think one might argue
that in many cases one could get the available space for members by
just setting appropriate values for vacuum_multixact_*  params, but I feel
it is better to have some auto adjustment algorithm like this patch is
trying to do so that even if those values are not set appropriately, it can
avoid the wraparound error.  I think the only thing we might need to be
cautious about is that new calculation should not make it worse (less
aggresive) in case of lower values for vacuum_multixact_* parameters.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: Jeff Davis
Date:
Subject: Re: Failure to coerce unknown type to specific type