Ever seen transient garbage results from DELETE RETURNING? - Mailing list pgsql-hackers

From Tom Lane
Subject Ever seen transient garbage results from DELETE RETURNING?
Date
Msg-id 24658.1362844878@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
While hacking on the writable-foreign-tables patch, my attention was
drawn to what seems to be a pre-existing bug.  Consider the section
of ExecDelete() that computes the results for DELETE RETURNING:
   /* Process RETURNING if present */   if (resultRelInfo->ri_projectReturning)   {       /*        * We have to put
thetarget tuple into a slot, which means first we        * gotta fetch it.  We can use the trigger tuple slot.
*/      TupleTableSlot *slot = estate->es_trig_tuple_slot;       TupleTableSlot *rslot;       HeapTupleData deltuple;
   Buffer        delbuffer;
 
       if (oldtuple != NULL)       {           deltuple.t_data = oldtuple;           deltuple.t_len =
HeapTupleHeaderGetDatumLength(oldtuple);          ItemPointerSetInvalid(&(deltuple.t_self));
deltuple.t_tableOid= InvalidOid;           delbuffer = InvalidBuffer;       }       else       {
deltuple.t_self= *tupleid;           if (!heap_fetch(resultRelationDesc, SnapshotAny,
&deltuple,&delbuffer, false, NULL))               elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
  }
 
       if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))           ExecSetSlotDescriptor(slot,
RelationGetDescr(resultRelationDesc));      ExecStoreTuple(&deltuple, slot, InvalidBuffer, false);
 
       rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning,                                    slot,
planSlot);
       ExecClearTuple(slot);       if (BufferIsValid(delbuffer))           ReleaseBuffer(delbuffer);
       return rslot;   }

In the normal code path where we're deleting from a plain table, we
fetch back the just-deleted row (using SnapshotAny), compute the
RETURNING expressions, then release the pin on the row's buffer.
But suppose that some of the RETURNING expressions are simple Vars
of pass-by-reference data types.  What we will end up with after
ExecProcessReturning is a "virtual tuple" (that is, an array of
Datum values) some of which are pass-by-reference pointers pointing
right into the disk buffer.  As soon as we ReleaseBuffer, those are
dangling pointers, because the disk buffer is now subject to change.
If the buffer did change before the RETURNING results got sent off
to the client, we'd return garbage, or maybe even crash trying to
interpret the fields.

The obvious mechanism for this, that the shared buffer's content
gets replaced by another page, is probably pretty improbable because
our own query has just made at least two separate touches of the page;
it should have a high enough LRU count to protect it for a little
while.  The only other possibility I can think of is that HOT pruning
or VACUUM comes along and shuffles data on the page.  This is probably
not very likely either, because our own query would have done a round
of HOT pruning when it first touched the page; barring some commits
in between, there wouldn't be much reason for the page to change further.

So this seems to be a live bug but one that's quite unlikely to be
seen in the field, and it wouldn't be reproducible at all either.
I'm curious if anyone has seen symptoms that might match this.

The fix is simple: insert an ExecMaterializeSlot(rslot) before releasing
the underlying data.  It will add a few cycles but there seems no help
for that; the whole point is we gotta copy data off the disk buffer
before releasing that pin.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Amit kapila
Date:
Subject: Re: Identity projection
Next
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums