Re: Errors on missing pg_subtrans/ files with 9.3 - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Errors on missing pg_subtrans/ files with 9.3
Date
Msg-id 20131125201039.GF6597@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: Errors on missing pg_subtrans/ files with 9.3  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Errors on missing pg_subtrans/ files with 9.3  (Andres Freund <andres@2ndquadrant.com>)
Re: Errors on missing pg_subtrans/ files with 9.3  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Andres Freund escribió:

> Ok, this is helpful. Do you rather longrunning transactions? The
> transaction that does foreign key checks has an xid of 10260613, while
> the row that's getting checked has 13514992.

Thanks for the analysis.

> #4  0x0000000000635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501
>         tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type =
4'\004', locktag_lockmethodid = 1 '\001'}
 
> #5  0x0000000000482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=<value optimized out>, ctid=<value
optimizedout>, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847
 
> 
> I am not sure whether that's the origin of the problem but at the very
> least it seems to me that heap_lock_updated_tuple_rec() is missing
> several important pieces:
> a) do the priorXmax==xmin dance to check we're still following the same
>    ctid chain. Currently we could easily stumble across completely
>    unrelated tuples if a chain element aborted and got vacuumed.

This seems simple to handle by adding the check you propose to the loop.
Basically if the xmax doesn't match the xmin, we reached the end,
there's nothing more to lock and we can return success without any
further work:
    /*     * Check the tuple XMIN against prior XMAX, if any.  If we reached     * the end of the chain, we're done, so
returnsuccess.     */    if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(&mytup),                            priorXmax))    {
UnlockReleaseBuffer(buf);       return HeapTupleMayBeUpdated;    }
 


> b) Check whether a chain element actually aborted - currently we're
>    only doing that in the HEAP_KEYS_UPDATED updated case, but that seems
>    wrong (we can't check for committed tho!).

Let me point out that this is exactly the same code that would be
affected by my proposed fix for #8434, which would have this check the
updateXid in all cases, not only in KEYS_UPDATED as currently.

I don't understand your comment about "can't check for committed".  I
mean, if the updating transaction committed, then we need to return
failure to caller, HeapTupleUpdated.  This signals to the caller that
the future version of the tuple is locked and the whole thing needs to
be failed.  (In READ COMMITTED isolation level, this would cause
EvalPlanQual to fetch the updated version of the tuple and redo the
whole thing from there.  In REPEATABLE READ and above, the whole
operation would be aborted.)

In short I propose to fix part this with the simple patch I proposed for
bug #8434.

> c) (reported separately as well) cope with failure of heap_fetch() to
>    return anything.

Proposed patch for this was posted in the other thread.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: [PATCH] SQL assertions prototype
Next
From: AK
Date:
Subject: Re: Why is UPDATE with column-list syntax not implemented