Ever seen transient garbage results from DELETE RETURNING? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Ever seen transient garbage results from DELETE RETURNING? |
Date | |
Msg-id | 24658.1362844878@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
While hacking on the writable-foreign-tables patch, my attention was drawn to what seems to be a pre-existing bug. Consider the section of ExecDelete() that computes the results for DELETE RETURNING: /* Process RETURNING if present */ if (resultRelInfo->ri_projectReturning) { /* * We have to put thetarget tuple into a slot, which means first we * gotta fetch it. We can use the trigger tuple slot. */ TupleTableSlot *slot = estate->es_trig_tuple_slot; TupleTableSlot *rslot; HeapTupleData deltuple; Buffer delbuffer; if (oldtuple != NULL) { deltuple.t_data = oldtuple; deltuple.t_len = HeapTupleHeaderGetDatumLength(oldtuple); ItemPointerSetInvalid(&(deltuple.t_self)); deltuple.t_tableOid= InvalidOid; delbuffer = InvalidBuffer; } else { deltuple.t_self= *tupleid; if (!heap_fetch(resultRelationDesc, SnapshotAny, &deltuple,&delbuffer, false, NULL)) elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING"); } if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc)) ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc)); ExecStoreTuple(&deltuple, slot, InvalidBuffer, false); rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning, slot, planSlot); ExecClearTuple(slot); if (BufferIsValid(delbuffer)) ReleaseBuffer(delbuffer); return rslot; } In the normal code path where we're deleting from a plain table, we fetch back the just-deleted row (using SnapshotAny), compute the RETURNING expressions, then release the pin on the row's buffer. But suppose that some of the RETURNING expressions are simple Vars of pass-by-reference data types. What we will end up with after ExecProcessReturning is a "virtual tuple" (that is, an array of Datum values) some of which are pass-by-reference pointers pointing right into the disk buffer. As soon as we ReleaseBuffer, those are dangling pointers, because the disk buffer is now subject to change. If the buffer did change before the RETURNING results got sent off to the client, we'd return garbage, or maybe even crash trying to interpret the fields. The obvious mechanism for this, that the shared buffer's content gets replaced by another page, is probably pretty improbable because our own query has just made at least two separate touches of the page; it should have a high enough LRU count to protect it for a little while. The only other possibility I can think of is that HOT pruning or VACUUM comes along and shuffles data on the page. This is probably not very likely either, because our own query would have done a round of HOT pruning when it first touched the page; barring some commits in between, there wouldn't be much reason for the page to change further. So this seems to be a live bug but one that's quite unlikely to be seen in the field, and it wouldn't be reproducible at all either. I'm curious if anyone has seen symptoms that might match this. The fix is simple: insert an ExecMaterializeSlot(rslot) before releasing the underlying data. It will add a few cycles but there seems no help for that; the whole point is we gotta copy data off the disk buffer before releasing that pin. regards, tom lane
pgsql-hackers by date: