On 2014-05-30 11:00:06 -0400, Bruce Momjian wrote:
> On Fri, May 30, 2014 at 03:32:44PM +0200, Andres Freund wrote:
> > On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:
> > > This is a bug in 9.3 pg_upgrade as well?
> >
> > Yes.
> >
> > > Why has no one reported it before?
> >
> > My guess is that it wasn't attributed to pg_upgrade in the
> > past. Typically the error will only occur a fair amount of time
> > later. You'll just see vacuums randomly erroring out with slru.c errors
> > about nonexistant files :(.
>
> But how much later? pg_upgrade is pretty popular now but I am just not
> seeing the number of errors as I would expect:
>
> ERROR: could not access status of transaction 2072053907
> DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.
> I am not saying there is no bug, but from your analysis it would seem to
> be 100% of pg_upgrade'ed clusters that use multi-xacts.
It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.
Also the new cluster needs to have used more than
vacuum_multixact_freeze_min_age multis after the upgrade to make the
problem visible. I think.
> If so, it seems we would need to tell everyone to remove the 0000 files
> if there are higher numbered ones with numbering gaps.
The problem is that it's not actually that easy to define whether
there's a gap - after a multixact id wraparound the 0000 file might
exist again. So I'd rather not give a simple instruction that might
delete critical data.
> Is this something our next minor release should fix in the multi-xacts
> code?
I wondered whether we could detect that case and deal with it
transparently, but haven't come up with something smart. Alvaro?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services