Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

On Sun, May 3, 2015 at 4:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> About data, I have extracted parts where there is a change in
> oldest_mxid and segments
>
> time segments usage_fraction usage_kb oldest_mxid next_mxid next_offset
>
> 13:48:36 1 0 16 1 1 0
> 13:49:36 369 .0044 94752 1 1 0
> ..
> 14:44:04 41703 .5083 10713400 1 8528909 2140755909
>
> 14:45:05 1374 .0167 352960 8573819 8722521 2189352521
> ..
> 15:37:16 41001 .4997 10529528 8573819 17060811 4282263311
> ..
> 15:38:16 709 .0086 182056 17132168 17254423 35892627
> ..
> 16:57:15 41440 .5051 10644712 17132168 25592713 2128803417
> ..
> 16:58:16 1120 .0136 287416 25695507 25786824 2177525278
>
> Based on this data, it seems that truncation of member space
> as well as advancement of oldest multixact id happens once
> it reaches 50% usage and at that time segments drops down to almost
> zero.  This happens repeatedly after 1 hour and in-between there
> is no progress which indicates that all the work happens at
> one go rather than in spreaded way. Won't this choke the system
> when it happens due to I/O, isn't it better if we design it in a way such
> that it is spreaded over period of time rather than doing everything at
> one go?

At 50% usage it starts vacuuming tables that have the very oldest mxid
in the system.  That is, at 50% usage, we try to select the smallest
non-zero fraction of tables that can be selected based on relminmxid
alone, with any luck exactly one or close to it.  As usage increases,
we decrease the cutoff age slowly so we start selecting tables with
slightly newer relminmxid values for vacuuming too.  Only if it
reaches 75% usage will it vacuum everything it can and eat all your
IO.  In a real system, I suppose there would be lots of big tables
that need to be vacuumed, the vacuuming would take some time, and they
would tend to have different relminmxids so that vacuuming would be
effectively spread out by this selection algorithm.  I think.  Perhaps
we should devise a test procedure to try to see if that happens.  We'd
need to create a bunch of big tables and modify monitor.sh to show the
relminmxid for each of them so that you could see when they are being
processed -- I will look into that.

Restricting ourselves to selecting tables to vacuum using their
relminmxid alone makes this patch small since autovacuum already works
that way.  We *could* introduce code that would be able to spread out
the work of vacuuming tables that happen to have identical or very
close relminmxid (say by introducing some non-determinism or doing
something weird based on hashing table oids and the time to explicitly
spread the start of processing over time, or <your idea here>), but I
didn't want to propose anything too big/complicated/clever/stupid and
I suspect that the relminmxid values will tend to diverge over time
(but I could be wrong about that, if they all start at 1 and then move
forward in lockstep over long periods of time then what I propose is
not good enough... let's see if we can find out).

> --
> +int
> +compute_max_multixact_age_to_avoid_member_wrap(bool manual)
> {
> ..
> + if (members <= safe_member_count)
> + {
> + /*
> + * There is no danger of
> member wrap, so return a number that is not
> + * lower than autovacuum_multixact_freeze_max_age.
> +
> */
> + return -1;
> + }
> ..
>
> The above code doesn't seem to match its comments.
> Comment says "..not lower than autovacuum_multixact_freeze_max_age",
> but then return -1.  It seems to me here we should return unchanged
> autovacuum_multixact_freeze_max_age as it was coded in the initial
> version of patch.  Do you have any specific reason to change it?

Oops, the comment is fixed in the attached patch.

In an earlier version, I was only dealing with the autovacuum case.
Now that the VACUUM command also calls it, I didn't want this
compute_max_multixact_age_to_avoid_member_wrap function to assume that
it was being called by autovacuum code and return the
autovacuum-specific GUC in the case that no special action is needed.
Also, the function no longer computes a value by scaling
autovacuum_multixact_freeze_max_age, it now scales the current number
of active multixacts, so that we can begin selecting a small non-zero
number of tables to vacuum as soon as we exceed safe_member_count as
described above (whereas when we used a scaled down
autovaccum_multixact_freeze_max_age, we usually didn't select any
tables at all until we scaled it down a lot, ie until we got close to
dangerous_member_count).  Finally, I wanted a special value like -1
for 'none' so that table_recheck_autovac and ExecVacuum could use a
simple test >= 0 to know that they also need to set
multixact_freeze_min_age to zero in the case of a
member-space-triggered vacuum, so that we get maximum benefit from our
table scans by freezing all relevant tuples, not just some older ones
(that's why you see usage drop to almost zero each time, whereas the
monitor.sh results I showed from the earlier patch trimmed usage by
varying amounts, which meant that autovacuum wraparounds would need to
be done again sooner).  Does that make sense?

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-bugs by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Failure to coerce unknown type to specific type
Next
From: Amit Kapila
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)