Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-general
From | Thomas Munro |
---|---|
Subject | Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |
Date | |
Msg-id | CAEepm=2dzNcdKLs=cJEzXuvNXTj-6CMou1JT9g5uzsU6dErNcg@mail.gmail.com Whole thread Raw |
In response to | Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: 9.4.1 -> 9.4.2 problem: could not access status of
transaction 1
|
List | pgsql-general |
On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Here's a new version with some more fixes and improvements: > > - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset > when the oldest offset became known if the now-known value happened to > be zero. Fixed. > > - SetOffsetVacuumLimit now logs useful information at the DEBUG1 > level, so that you can see that it's doing what it's supposed to. > > - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the > offsetStopLimit even if it can't truncate anything. This seems > useless, but it's not, because it may be that the last checkpoint > advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real > value, and now we can actually set offsetStopLimit properly. > > - TruncateMultiXact no longer calls find_multixact_start when there > are no remaining multixacts. This is actually a completely separate > bug that goes all the way back to 9.3.0 and can potentially cause > TruncateMultiXact to remove every file in pg_multixact/offsets. > Restarting the cluster becomes impossible because TrimMultiXact barfs. > > - TruncateMultiXact now logs a message if the oldest multixact does > not precede the earliest one on disk and is not equal to the next > multixact and yet does not exist. The value of the log message is > that it discovered the bug mentioned in the previous line, so I think > it's earning its keep. > > With this version, I'm able to see that when you start up a > 9.3.latest+this patch with a cluster that has a bogus value of 1 in > relminmxid, datminmxid, and the control file, autovacuum vacuums > everything in sight, all the values get set back to the right thing, > and the next checkpoint enables the member-wraparound guards. This > works with both autovacuum=on and autovacuum=off; the emergency > mechanism kicks in as intended. We'll want to warn people with big > databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may > want to pre-vacuum those tables before upgrading to avoid a vacuum > storm. But generally I'm pretty happy with this: forcing those values > to get fixed so that we can guard against member-space wraparound > seems like the right thing to do. > > So, to summarize, this patch does the following: > > - Fixes the failure-to-start problems introduced in 9.4.2 in > complicated pg_upgrade scenarios. > - Prevents the new calls to find_multixact_start we added in 9.4.2 > from happening during recovery, where they can only create failure > scenarios. The call in TruncateMultiXact that has been there all > along is not eliminated, but now handles failure more gracefully. > - Fixes possible incorrect removal of every single > pg_multixact/offsets file when no multixacts exist; one file should be > kept. > - Forces aggressive autovacuuming when the control file's > oldestMultiXid doesn't point to a valid MultiXact and enables member > wraparound at the next checkpoint following the correction of that > problem. With this patch, when I run the script "checkpoint-segment-boundary.sh" from http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=UDg@mail.gmail.com I see the following during shutdown checkpoint: LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound That message comes from SimpleLruTruncate. -- Thomas Munro http://www.enterprisedb.com
pgsql-general by date: