Home > mailing lists

Re: assertion failure 9.3.4 - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: assertion failure 9.3.4
Date	April 21, 2014 18:54:27
Msg-id	20140421185422.GA13906@alap3.anarazel.de Whole thread Raw
In response to	assertion failure 9.3.4 (Andrew Dunstan <andrew.dunstan@pgexperts.com>)
Responses	Re: assertion failure 9.3.4 Re: assertion failure 9.3.4
List	pgsql-hackers

Tree view

Hi,

I spent the last two hours poking arounds in the environment Andrew
provided and I was able to reproduce the issue, find a assert to
reproduce it much faster and find a possible root cause.

Since the symptom of the problem seem to be multixacts with more than
one updating xid, I added a check to MultiXactIdCreateFromMembers()
preventing that. That requires to move ISUPDATE_from_mxstatus() to a
header, but I think we should definitely add such a assert.

As it turns out the problem is in the
else if (result == HeapTupleBeingUpdated && wait)
branch in (at least) heap_update(). When the problem is hit the
to-be-updated tuple originally has HEAP_XMIN_COMMITTED |
HEAP_XMAX_LOCK_ONLY | HEAP_XMAX_KEYSHR_LOCK set. So we release the
buffer lock, acquire the tuple lock, and reacquire the buffer lock. But
inbetween the locking backend has actually updated the tuple.
The code tries to protect against that with:               /*                * recheck the locker; if someone else
changedthe tuple while                * we weren't looking, start over.                */               if
((oldtup.t_data->t_infomask& HEAP_XMAX_IS_MULTI) ||                   !TransactionIdEquals(
     HeapTupleHeaderGetRawXmax(oldtup.t_data),                                        xwait))                   goto
l2;
               can_continue = true;               locker_remains = true;

and similar. The problem is that in Andrew's case the infomask changes
from 0x2192 to 0x2102 (i.e. it's a normal update afterwards), while xmax
stays the same. Ooops.
A bit later there's:       result = can_continue ? HeapTupleMayBeUpdated : HeapTupleUpdated;
So, from thereon we happily continue to update the tuple, thinking
there's no previous updater. Which obviously causes problems.

I've hacked^Wfixed this by changing the infomask test above into
infomask != oldtup.t_data->t_infomask in a couple of places. That seems
to be sufficient to survive the testcase a couple of times.

I am too hungry right now to think about a proper fix for this and
whether there's further problematic areas.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Stephen Frost
Date: 21 April 2014, 18:11:46
Subject: Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD

From: Tom Lane
Date: 21 April 2014, 19:26:10
Subject: Re: assertion failure 9.3.4

Re: assertion failure 9.3.4 - Mailing list pgsql-hackers

Previous

Next