Thread: Planned change of ExecRestrPos API

Planned change of ExecRestrPos API

From
Tom Lane
Date:
I'm planning to change ExecRestrPos and the routines it calls so that
an updated TupleTableSlot holding the restored-to tuple is explicitly
returned.

Currently, since nothing is explicitly done to the result Slot of a
plan node when we restore its position, you might think that the Slot
still points at the tuple that was current just before the Restore.
You'd be wrong though, at least for seqscan and indexscan plans
(I haven't looked yet at the other node types that support
mark/restore).  The reason is that the restore operation changes
the contents of a HeapTupleData struct in the scan state (rs_ctup
or xs_ctup) and all that the Slot really contains is a pointer to
that struct.

Now this is really bad.  In the first place, the Slot thinks it
has a pin on the buffer containing its current tuple.  After a
Restore, it may have pin on the wrong buffer.  It seems to be
sheer chance that we've not had bugs due to this.  (The underlying
scan does have pin on the right buffer, but one can easily imagine
sequences in which the scan could be cleared while the Slot is
still assumed valid.)  As of CVS tip the consequences could be
even worse, because the Slot may contain some pointers to extracted
fields of the tuple, and these pointers are now out of sync with
the tuple that the Slot really contains.

So I think that it's essential that we explicitly update the scan
result Slot during ExecRestrPos.

It seems to be a good idea also to make the function return the Slot.
As far as I can tell, nodeMergeJoin has been depending on the assumption
that the physical address of the result slot doesn't change during
Restore.  Which is true for all the current plan types, but since
the ExecProcNode API isn't designed to assume that a node always
returns the same Slot, it doesn't seem like ExecRestrPos should either.
        regards, tom lane


Re: Planned change of ExecRestrPos API

From
Tom Lane
Date:
I wrote:
> Currently, since nothing is explicitly done to the result Slot of a
> plan node when we restore its position, you might think that the Slot
> still points at the tuple that was current just before the Restore.
> You'd be wrong though, at least for seqscan and indexscan plans
> (I haven't looked yet at the other node types that support
> mark/restore).

Actually, on looking closer, only seqscans have this problem --- and
ExecSeqRestrPos is really dead code anyway at the moment.  So rather
than go through a large exercise to change the mark/restore API, I've
just added some comments about what the API actually guarantees, and
tweaked ExecSeqRestrPos to clear the output slot instead of leaving it
in a potentially inconsistent state.
        regards, tom lane


Re: Planned change of ExecRestrPos API

From
Simon Riggs
Date:
On Sun, 2005-05-15 at 15:09 -0400, Tom Lane wrote:
> I'm planning to change ExecRestrPos and the routines it calls so that
> an updated TupleTableSlot holding the restored-to tuple is explicitly
> returned.
> 
> Currently, since nothing is explicitly done to the result Slot of a
> plan node when we restore its position, you might think that the Slot
> still points at the tuple that was current just before the Restore.
> You'd be wrong though, at least for seqscan and indexscan plans
> (I haven't looked yet at the other node types that support
> mark/restore).  The reason is that the restore operation changes
> the contents of a HeapTupleData struct in the scan state (rs_ctup
> or xs_ctup) and all that the Slot really contains is a pointer to
> that struct.
> 
> Now this is really bad.  In the first place, the Slot thinks it
> has a pin on the buffer containing its current tuple.  After a
> Restore, it may have pin on the wrong buffer.  

Sounds terrible. Is this a particular bug you're fixing?

> It seems to be
> sheer chance that we've not had bugs due to this.  

It isn't a very common case, thats why. You'd need to have value
duplicates in both joined columns, which is effectively a product join.
Granted it is syntactically allowable SQL.

AFAICS all joins should be between relations 1:1 or 1:M. A direct M:M
join is actually missing out the associative relation, or a non-key self
join. Such a join would rarely if ever have any correct and real
meaning. I can think of a few, but mostly its just incorrectly coded
SQL, or use of special values (e.g. blanks) rather than NULLs.

So my guess is that ExecRestrPos is almost never called.

> (The underlying
> scan does have pin on the right buffer, but one can easily imagine
> sequences in which the scan could be cleared while the Slot is
> still assumed valid.)  As of CVS tip the consequences could be
> even worse, because the Slot may contain some pointers to extracted
> fields of the tuple, and these pointers are now out of sync with
> the tuple that the Slot really contains.
> 
> So I think that it's essential that we explicitly update the scan
> result Slot during ExecRestrPos.

Yeh, no problem.

Best Regards, Simon Riggs