Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
Date | |
Msg-id | 20150604064226.GA99479@tornado.leadboat.com Whole thread Raw |
In response to | Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of
transaction 1
|
List | pgsql-hackers |
On Wed, Jun 03, 2015 at 04:53:46PM -0400, Robert Haas wrote: > So here's a patch taking a different approach. In this approach, if > the multixact whose members we want to look up doesn't exist, we don't > use a later one (that might or might not be valid). Instead, we > attempt to cope with the unknown. That means: > > 1. In TruncateMultiXact(), we don't truncate. I like that change a lot. It's much easier to seek forgiveness for wasting <= 28 GiB of disk than for deleting visibility information wrongly. > 2. If setting the offset stop limit (the point where we refuse to > create new multixact space), we don't arm the stop point. This means > that if you're in this situation, you run without member wraparound > protection until it's corrected. A message gets logged once per > checkpoint telling you that you have this problem, and another message > gets logged when things get straightened out and the guards are > enabled. > > 3. If setting the vacuum force point, we assume that it's appropriate > to immediately force vacuum. Those seem reasonable, too. > I've only tested this very lightly - this is just to see what you and > Noah and others think of the approach. As compared with the previous > approach, it has the advantage of making minimal assumptions about the > sanity of what's on disk. It has the disadvantage that, for some > people, the member-wraparound guard won't be enabled at startup -- but > note that those people can't start 9.3.7/9.4.2 *at all*, so currently > they are either running without member wraparound protection anyway > (if they haven't upgraded to those releases) or they're down entirely. That disadvantage is negligible, considering. > Another disadvantage is that we'll be triggering what may be quite a > bit of autovacuum activity for some people, which could be painful. > On the plus side, they'll hopefully end up with sane relminmxid and > datminmxid guards afterwards. That sounds good so long as each table requires just one successful emergency autovacuum. I'm not seeing code to ensure that the launched autovacuum will indeed perform a full-table scan and update relminmxid; is it there? For sites that can't tolerate an autovacuum storm, what alternative can we provide? Is "SET vacuum_multixact_freeze_table_age = 0; VACUUM <table>" of every table, done before applying the minor update, sufficient? > static void > -DetermineSafeOldestOffset(MultiXactId oldestMXact) > +DetermineSafeOldestOffset(MultiXactOffset oldestMXact) Leftover change from an earlier iteration? The values passed continue to be MultiXactId values. > /* move back to start of the corresponding segment */ > - oldestOffset -= oldestOffset % > - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); > + offsetStopLimit = oldestOffset - (oldestOffset % > + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)); > + /* always leave one segment before the wraparound point */ > + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); > + > + /* if nothing has changed, we're done */ > + if (prevOffsetStopLimitKnown && offsetStopLimit == prevOffsetStopLimit) > + return; > > LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE); > - /* always leave one segment before the wraparound point */ > - MultiXactState->offsetStopLimit = oldestOffset - > - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); > + MultiXactState->offsetStopLimit = oldestOffset; That last line needs s/oldestOffset/offsetStopLimit/, I presume. Thanks, nm
pgsql-hackers by date: