Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
Date
Msg-id 15875.1354298007@sss.pgh.pa.us
Whole thread Raw
In response to Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?
List pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-11-30 10:42:21 -0500, Tom Lane wrote:
>> I am suspicious that it is safe, because it's pretty darn hard to
>> believe we'd not have seen field reports of problems with triggers
>> otherwise.  It's not like this is a seldom-executed code path.

> Thats true, but I think due to the fact that the plain DELETEs do *not*
> trigger the Assert we might just not have noticed it.

OK, I've managed to reproduce an actual failure, ie VACUUM moving the
tuple underneath the fetch.  It's not easy to do though, and the timing
has to be *really* tight.

The reason that simple update/delete queries don't show the issue is
that the tuple being fetched is the "old" one that was just
updated/deleted, and in a simple query that one was just returned by a
relation-scan plan node that's underneath the ModifyTuple node, and
*that scan node still has pin* on the table page; or more accurately the
TupleTableSlot it's returned the scan tuple in has the pin.  This pin's
been held since we did a LockBuffer to examine the tuple's liveness, so
it's sufficient to prevent anyone from getting a cleanup lock.  However,
in a more complex plan such as a join, the ModifyTable node could be
receiving tuples that aren't the current tuple of a scan anymore.
(Your example case passes the tuples through a Sort, so none of them
are pinned by the time the ModifyTable node gets them.)

Another thing that reduces the odds of seeing this, in recent versions,
is that when we first scanned the page containing the "old" tuple,
we'll have (tried to) do a page prune.  So even if a VACUUM comes along
now, the odds are that it will not find any newly-reclaimable space.
To reproduce the problem, I had to arrange for another tuple on the
same page to become reclaimable between the relation scan and the
GetTupleForTrigger fetch (which I did by having another, serializable
transaction scan the table, then deleting the other tuple, then allowing
the serializable transaction to commit after the page prune step).

Lastly, the only way you see a problem is if VACUUM acquires cleanup
lock before GetTupleForTrigger pins the page, finds some reclaimable
space, and moves the target tuple on the page in order to compact the
free space, and that data movement happens in the narrow time window
between where GetTupleForTrigger does PageGetItem() and where it
finishes the heap_copytuple() call just below that.  Even then, you
might escape seeing a problem, unless the data movement also shifts
some *other* tuple into the space formerly occupied by the target.

So that's why nobody's reported the problem --- it's not happening
often enough for anyone to see it as a repeatable issue.

I'll go insert a LockBuffer call.  Good catch!
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: WIP json generation enhancements
Next
From: Andres Freund
Date:
Subject: Re: Re: missing LockBuffer(buffer, BUFFER_LOCK_SHARE) in trigger.c GetTupleForTrigger?