Thread: Planned change of ExecRestrPos API
I'm planning to change ExecRestrPos and the routines it calls so that an updated TupleTableSlot holding the restored-to tuple is explicitly returned. Currently, since nothing is explicitly done to the result Slot of a plan node when we restore its position, you might think that the Slot still points at the tuple that was current just before the Restore. You'd be wrong though, at least for seqscan and indexscan plans (I haven't looked yet at the other node types that support mark/restore). The reason is that the restore operation changes the contents of a HeapTupleData struct in the scan state (rs_ctup or xs_ctup) and all that the Slot really contains is a pointer to that struct. Now this is really bad. In the first place, the Slot thinks it has a pin on the buffer containing its current tuple. After a Restore, it may have pin on the wrong buffer. It seems to be sheer chance that we've not had bugs due to this. (The underlying scan does have pin on the right buffer, but one can easily imagine sequences in which the scan could be cleared while the Slot is still assumed valid.) As of CVS tip the consequences could be even worse, because the Slot may contain some pointers to extracted fields of the tuple, and these pointers are now out of sync with the tuple that the Slot really contains. So I think that it's essential that we explicitly update the scan result Slot during ExecRestrPos. It seems to be a good idea also to make the function return the Slot. As far as I can tell, nodeMergeJoin has been depending on the assumption that the physical address of the result slot doesn't change during Restore. Which is true for all the current plan types, but since the ExecProcNode API isn't designed to assume that a node always returns the same Slot, it doesn't seem like ExecRestrPos should either. regards, tom lane
I wrote: > Currently, since nothing is explicitly done to the result Slot of a > plan node when we restore its position, you might think that the Slot > still points at the tuple that was current just before the Restore. > You'd be wrong though, at least for seqscan and indexscan plans > (I haven't looked yet at the other node types that support > mark/restore). Actually, on looking closer, only seqscans have this problem --- and ExecSeqRestrPos is really dead code anyway at the moment. So rather than go through a large exercise to change the mark/restore API, I've just added some comments about what the API actually guarantees, and tweaked ExecSeqRestrPos to clear the output slot instead of leaving it in a potentially inconsistent state. regards, tom lane
On Sun, 2005-05-15 at 15:09 -0400, Tom Lane wrote: > I'm planning to change ExecRestrPos and the routines it calls so that > an updated TupleTableSlot holding the restored-to tuple is explicitly > returned. > > Currently, since nothing is explicitly done to the result Slot of a > plan node when we restore its position, you might think that the Slot > still points at the tuple that was current just before the Restore. > You'd be wrong though, at least for seqscan and indexscan plans > (I haven't looked yet at the other node types that support > mark/restore). The reason is that the restore operation changes > the contents of a HeapTupleData struct in the scan state (rs_ctup > or xs_ctup) and all that the Slot really contains is a pointer to > that struct. > > Now this is really bad. In the first place, the Slot thinks it > has a pin on the buffer containing its current tuple. After a > Restore, it may have pin on the wrong buffer. Sounds terrible. Is this a particular bug you're fixing? > It seems to be > sheer chance that we've not had bugs due to this. It isn't a very common case, thats why. You'd need to have value duplicates in both joined columns, which is effectively a product join. Granted it is syntactically allowable SQL. AFAICS all joins should be between relations 1:1 or 1:M. A direct M:M join is actually missing out the associative relation, or a non-key self join. Such a join would rarely if ever have any correct and real meaning. I can think of a few, but mostly its just incorrectly coded SQL, or use of special values (e.g. blanks) rather than NULLs. So my guess is that ExecRestrPos is almost never called. > (The underlying > scan does have pin on the right buffer, but one can easily imagine > sequences in which the scan could be cleared while the Slot is > still assumed valid.) As of CVS tip the consequences could be > even worse, because the Slot may contain some pointers to extracted > fields of the tuple, and these pointers are now out of sync with > the tuple that the Slot really contains. > > So I think that it's essential that we explicitly update the scan > result Slot during ExecRestrPos. Yeh, no problem. Best Regards, Simon Riggs