Re: MultiXactId error after upgrade to 9.3.4 - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: MultiXactId error after upgrade to 9.3.4
Date
Msg-id 20140423184214.GQ25695@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: MultiXactId error after upgrade to 9.3.4  (Bruce Momjian <bruce@momjian.us>)
Responses Re: MultiXactId error after upgrade to 9.3.4  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Bruce Momjian wrote:
> On Wed, Apr 23, 2014 at 03:01:02PM -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:
> > > > My conclusion here is that some part of the code is failing to examine
> > > > XMAX_INVALID before looking at the value stored in xmax itself.  There
> > > > ought to be a short-circuit.  Fortunately, this bug should be pretty
> > > > harmless.
> > > > 
> > > > .. and after looking, I'm fairly sure the bug is in
> > > > heap_tuple_needs_freeze.
> > > 
> > > heap_tuple_needs_freeze() isn't *allowed* to look at
> > > XMAX_INVALID. Otherwise it could miss freezing something still visible
> > > on a standby or after an eventual crash.
> > 
> > I think what we should do here is that if we see that XMAX_INVALID is
> > set, we just reset everything to zero without checking the multixact
> > contents.  Something like the attached (warning: hand-edited, line
> > numbers might be bogus)
> > 
> > I still don't know under what circumstances this situation could arise.
> > This seems most strange to me.  I would wonder about this to be just
> > papering over a different bug elsewhere, except that we know this tuple
> > comes from a pg_upgraded table and so I think the only real solution is
> > to cope.
> 
> Shouldn't we log something at least if we are unsure of the cause?

I don't know.  Is it possible that XMAX_IS_MULTI got set because of
cosmic rays?  At this point that's the only explanation that makes sense
to me.  And I'm not sure what to do about this until we know more --
more user reports of this problem, for instance.

I don't see any reasonable way to distinguish this particular kind of
multixact-out-of-bounds situation from any other, so not sure what else
to log either (you can see that we already emit an error message.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: MultiXactId error after upgrade to 9.3.4
Next
From: Bruce Momjian
Date:
Subject: Re: MultiXactId error after upgrade to 9.3.4