Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts - Mailing list pgsql-bugs

From Andres Freund
Subject Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Date
Msg-id 20140530151317.GB1220@awork2.anarazel.de
Whole thread Raw
In response to Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Bruce Momjian <bruce@momjian.us>)
Responses Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Bruce Momjian <bruce@momjian.us>)
List pgsql-bugs
On 2014-05-30 11:00:06 -0400, Bruce Momjian wrote:
> On Fri, May 30, 2014 at 03:32:44PM +0200, Andres Freund wrote:
> > On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:
> > > This is a bug in 9.3 pg_upgrade as well?
> >
> > Yes.
> >
> > >  Why has no one reported it before?
> >
> > My guess is that it wasn't attributed to pg_upgrade in the
> > past. Typically the error will only occur a fair amount of time
> > later. You'll just see vacuums randomly erroring out with slru.c errors
> > about nonexistant files :(.
>
> But how much later?  pg_upgrade is pretty popular now but I am just not
> seeing the number of errors as I would expect:
>
>     ERROR: could not access status of transaction 2072053907
>     DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

> I am not saying there is no bug, but from your analysis it would seem to
> be 100% of pg_upgrade'ed clusters that use multi-xacts.

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.
Also the new cluster needs to have used more than
vacuum_multixact_freeze_min_age multis after the upgrade to make the
problem visible. I think.

> If so, it seems we would need to tell everyone to remove the 0000 files
> if there are higher numbered ones with numbering gaps.

The problem is that it's not actually that easy to define whether
there's a gap - after a multixact id wraparound the 0000 file might
exist again. So I'd rather not give a simple instruction that might
delete critical data.

> Is this something our next minor release should fix in the multi-xacts
> code?

I wondered whether we could detect that case and deal with it
transparently, but haven't come up with something smart. Alvaro?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Next
From: Sandro Santilli
Date:
Subject: Re: uninterruptable loop: concurrent delete in progress within table