Re: assertion failure 9.3.4 - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: assertion failure 9.3.4
Date
Msg-id 20140422210140.GI25695@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: assertion failure 9.3.4  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: assertion failure 9.3.4
List pgsql-hackers
Andres Freund wrote:
> On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
> >
> > On 04/21/2014 02:54 PM, Andres Freund wrote:
> > >Hi,
> > >
> > >I spent the last two hours poking arounds in the environment Andrew
> > >provided and I was able to reproduce the issue, find a assert to
> > >reproduce it much faster and find a possible root cause.
> >
> >
> > What's the assert that makes it happen faster? That might help a lot in
> > constructing a self-contained test.
>
> Assertion and *preliminary*, *hacky* fix attached.

Thanks for the analysis and patches.  I've been playing with this on my
own a bit, and one thing that I just noticed is that at least for
heap_update I cannot reproduce a problem when the xmax is originally a
multixact, so AFAICT the number of places that need patched aren't as
many.

Some testing later, I think the issue only occurs if we determine that
we don't need to wait for the xid/multi to complete, because otherwise
the wait itself saves us.  (It's easy to cause the problem by adding a
breakpoint in heapam.c:3325, i.e. just before re-acquiring the buffer
lock, and then having transaction A lock for key share, then transaction
B update the tuple which stops at the breakpoint, then transaction A
also update the tuple, and finally release transaction B).

For now I offer a cleaned up version of your patch to add the assertion
that multis don't contain multiple updates.  I considered the idea of
making this #ifdef USE_ASSERT_CHECKING, because it has to walk the
complete array of members; and then have full elogs in MultiXactIdExpand
and MultiXactIdCreate, which are lighter because they can check more
easily.  But on second thoughts I refrained from doing that, because
surely the arrays are not as large anyway, are they.

I think I should push this patch first, so that Andrew and Josh can try
their respective test cases which should start throwing errors, then
push the actual fixes.  Does that sound okay?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Next
From: Josh Berkus
Date:
Subject: Re: assertion failure 9.3.4