Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date
Msg-id CAEepm=2W9LyVy5ASPtH7cLnKsFnw5KGo3vRyteLXYhe=C+6ywQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
On Wed, Apr 29, 2015 at 11:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 1. It seems that you are using
> autovacuum_multixact_freeze_max_age_adjusted()
> only at couple of places, like it is not used in below calc:
>
> vacuum_set_xid_limits()
> {
> ..
> mxid_freezemin = Min(mxid_freezemin,
> autovacuum_multixact_freeze_max_age / 2);
> ..
> }
>
> What is the reason of using this calculation at some places and
> not at other places?

You're right, I will review the other places where that variable is
used and come back on this point.

> 2.
> @@ -2684,8 +2719,8 @@ relation_needs_vacanalyze(Oid relid,
>   : autovacuum_freeze_max_age;
>
>   multixact_freeze_max_age = (relopts && relopts->multixact_freeze_max_age
>>= 0)
> - ? Min(relopts->multixact_freeze_max_age,
> autovacuum_multixact_freeze_max_age)
> - : autovacuum_multixact_freeze_max_age;
> + ? Min(relopts->multixact_freeze_max_age,
> autovacuum_multixact_freeze_max_age_adjusted())
> + : autovacuum_multixact_freeze_max_age_adjusted();
>
>
> It seems that it will try to read from offset file for each
> relation which might or might not be good, shall we try to
> cache the oldestMultiXactMemberOffset?

True.  I will look into this.

> 3. currently there is some minimum limit of autovacuum_multixact_freeze_age
> (10000000)
> which might not be honored by this calculation, so not sure if that can
> impact the
> system performance in some cases where it is currently working sane.

The reason why we need to be able to set the effective freeze age
below that minimum in cases of high member data consumption rates is
that you could hit the new member space wraparound prevention error
before you consume anywhere near that many multixact IDs.  That
minimum may well be entirely reasonable if the only thing you're
worried about is multixact ID wraparound prevention.

For example, my test program eats an average of 250 members per
multixact ID when run with 500 sessions (each loop creates 500
multixact IDs having 1, 2, 3, ..., 500 members).  At that rate, you'll
run out of addressable member space after 2^32 / 250 = 17,179,869
multixact IDs.  To prevent an error condition using only the existing
multixact ID wraparound prevention machinery, we need to have an
effective max table age (so that autovacuum wakes up and scans all
tables) and min freeze age (so that it actually freezes the tuples)
below that number.  So we have to ignore the GUC minimum in this
situation.

> 4. Can you please share results that can show improvement
> with current patch versus un-patched master?

Here is my test procedure: initdb, default settings except
max_connections = 600 and shared_buffers = 1280MB, "CREATE TABLE foo
AS SELECT 42 AS id;", start monitor.sh (attached) and then run
explode_mxact_members.c (attached to my first message in this thread)
with arguments 500 100000.

Here are the edited highlights of the result for the patched server
(the complete results are attached):

time segments usage_fraction usage_kb oldest_mxid next_mxid next_offset
10:37:01 1 0 16 1 1 0
10:38:01 656 .0079 168368 1 1 0
10:39:01 1426 .0173 366112 1 190619 47845119
10:40:01 2201 .0268 565080 1 384730 96566980
10:41:01 2979 .0363 764760 1 578342 145163592
...
11:58:57 60455 .7368 15529256 1 12601248 3162912998
11:59:59 61184 .7457 15717760 1 12601248 3162912998
12:01:01 61918 .7547 15906288 1 12794860 3211509610
12:02:03 2178 .0265 559360 12610230 12988472 3260106222 <-- oldest_mxid moved
12:03:03 2936 .0357 753936 12610230 13182583 3308828083
12:04:03 3702 .0451 950704 12610230 13376694 3357549944
...
13:18:23 58236 .7098 14956680 12610230 24623156 1885444610
13:19:23 58965 .7187 15143520 12610230 24817267 1934166471
13:20:24 59690 .7275 15329864 12610230 25011378 1982888332
13:21:24 59709 .7278 15334664 12758932 25204990 2031484944 <-- oldest_mxid moved
13:22:25 60447 .7367 15524464 12758932 25204990 2031484944
13:23:25 60836 .7415 15624192 12833782 25399101 2080206805 <-- oldest_mxid moved
13:24:26 25068 .3055 6438312 20447005 25592713 2128803417
13:25:26 25826 .3147 6632888 20447005 25786824 2177525278
13:26:26 26583 .3240 6827456 20447005 25980935 2226247139
...
14:10:42 59115 .7205 15182368 20447005 32767834 3929758788
14:11:42 59836 .7293 15367912 20447005 32767834 3929758788
14:12:43 60571 .7383 15556680 20447005 32961446 3978355400
14:13:43 38557 .4699 9902472 25189521 33155557 4027077261 <-- oldest_mxid moved
14:14:44 39267 .4786 10084408 25189521 33349169 4075673873
14:15:44 39975 .4872 10266816 25189521 33349169 4075673873
...
14:43:56 59805 .7289 15359072 25189521 37615619 851585527
14:44:56 60492 .7373 15536024 25189521 37615619 851585527
14:45:57 60468 .7370 15529536 25342215 37809231 900182139
14:46:58 48309 .5888 12406904 28023821 38003342 948904000 <-- oldest_mxid moved
14:47:58 49010 .5973 12586784 28023821 38196954 997500612
14:48:58 49716 .6059 12768152 28023821 38391065 1046222473
...
15:02:04 58882 .7177 15122600 28023821 40136068 1484218226
15:03:04 59588 .7263 15303856 28023821 40330179 1532940087
15:04:05 60293 .7349 15484592 28023821 40523791 1581536699
15:05:06 24657 .3005 6332800 35601635 40717902 1630258560 <-- oldest_mxid moved
15:06:06 25397 .3095 6522664 35601635 40717902 1630258560
15:07:06 26140 .3186 6713296 35601635 40912013 1678980421
...
15:52:22 58685 .7153 15071912 35601635 47698912 3382492070
15:53:23 59403 .7240 15255936 35601635 47892524 3431088682
15:54:23 60118 .7327 15439648 35601635 48086635 3479810543 <-- oldest_mxid moved
15:55:24 50415 .6145 12947712 37776297 48280247 3528407155
15:56:24 51144 .6234 13135568 37776297 48280247 3528407155
15:57:25 51869 .6322 13321872 37776297 48474358 3577129016
...

Observations:

1.  Sometimes the values don't change from minute to minute,
presumably because there hasn't been a checkpoint to update
pg_controldata on disk, but hopefully we can still see what's going on
here despite the slight lag in the data.

2.  We get to somewhere in the 73-75% SLRU used range before
wraparound vacuums are triggered.  We probably need to spread things
out more that that.

3.  When the autovacuum runs, it advances oldest_mxid by different
amounts each time; that's because I'm using the adjusted freeze max
age (the max age of a table before it gets a wraparound vacuum) as our
freeze min age (the max age for individual tuples before they're
frozen) here:

@@ -1931,7 +1964,9 @@ do_autovacuum(void)
  {
  default_freeze_min_age = vacuum_freeze_min_age;
  default_freeze_table_age = vacuum_freeze_table_age;
- default_multixact_freeze_min_age = vacuum_multixact_freeze_min_age;
+ default_multixact_freeze_min_age =
+ Min(vacuum_multixact_freeze_min_age,
+ autovacuum_multixact_freeze_max_age_adjusted());
  default_multixact_freeze_table_age = vacuum_multixact_freeze_table_age;
  }

Without that change, autovacuum would trigger repeatedly as we got
near 75% SLRU usage but not freeze anything, because
default_multixact_freeze_min_age was higher than the age of any tuples
(which had only made it to an age of around ~12 million; actually it's
not exactly the tuple age per se... I don't fully understand the
treatment of locker and updater multixact IDs in the vacuum code,
HeapTupleSatisfiesVacuum and heap_freeze_tuple etc yet so I'm not sure
exactly how that value translates into vacuum work, but I can see
experimentally that a low multixact freeze min age is needed to get
relminxmid moved forward).

It's good that freeze table age ramps down so that the autovacuum
launcher trigger point jumps around a bit and we spread the autovacuum
launches over time, but it's not great that we finish up truncating
different amounts of multixacts and associated SLRU each time.  We
could instead use a freeze min age of 0 to force freezing of *all*
tuples if this is a member-space-wraparound-prevention vacuum (that
is, if autovacuum_multixact_freeze_max_age !=
autovacuum_multixact_freeze_max_age_adjusted()).  IIUC that'd be like
running VACUUM FREEZE and would space out our wraparound vacuums as
far apart as possible since it would trims SLRU space as much possible
when a wraparound vacuum runs, but it would presumably generate more
writes.  Maybe it doesn't matter that we drop different amounts of
multixact history each time, as long as the long term behaviour works.
Thoughts?

There is less to say about the results with an unpatched server: it
drives in a straight line for a while, and then crashes into a wall
(ie the new error preventing member wraparound), which I see you have
also reproduced.  It's used up all of the circular member space, but
only has around 17 million multixacts so autovacuum can't help you
(it's not even possible to set autovacuum_multixact_freeze_max_age
below 100 million), so to get things moving again you need to manually
VACUUM FREEZE all databases including template databases.

> 5.
> + /*
> + * TODO: In future, could oldestMultiXactMemberOffset be stored in shmem,
> + *
> pg_controdata, alongside oldestMultiXactId?
> + */
>
> You might want to write the comment as:
> XXX: We can store oldestMultiXactMemberOffset in shmem, pg_controldata
> alongside oldestMultiXactId?

Thanks, done.

> 6.
> + * Returns vacuum_multixact_freeze_max_age, adjusted down to prevent
> excessive use
> + * of addressable
> multixact member space if required.
>
> I think here you mean autovacuum_multixact_freeze_max_age?

Thanks, done.

FWIW, in some future release, I think we should consider getting a
bigger multixact member address space that wraps around at 2^48 or
2^64 instead of 2^32, so that we can sidestep the whole business and
go back to having just xid and mxid wraparounds to worry about.
pg_multixact/offsets would be 50% or 100% bigger (an extra byte or two
per multixact), but it's not very big.  pg_multiact/members would be
no bigger for any workload that currently works without hitting the
wraparound error, but could grow bigger if needed.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-bugs by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: David Rowley
Date:
Subject: Re: BUG #13195: ALTER TABLE with DEFAULT NULL is different than implicit default