Attached is the current state of a patch to reduce the overhead of passing
tuple data up through many levels of plan nodes. It's not tested enough
to apply yet, but I thought I'd put it out for comment. It seems to get
about a factor of 4 speedup on Miroslav's nested-joins example (above
and beyond what we got from Atsushi Ogawa's patch).
The basic point of the patch is to allow a TupleTableSlot to contain
a "virtual" tuple instead of a regular heap tuple. The virtual tuple
is just an array of Datums, with any pass-by-reference Datums pointing
at original storage (either a lower-level slot or expression result
storage). This representation is essentially the raw output of
ExecProject. This not only avoids the overhead of forming the data into
a tuple (heap_formtuple) but also saves cycles when extracting the
data at the next level up, since we can just grab the Datums directly.
(This behavior builds on and shares code with Ogawa's patch to cache
extracted Datums in TupleTableSlots. When a slot contains a physical
tuple, the same Datum arrays cache any Datums extracted from it.)
Since a slot may or may not contain a regular tuple, you can't just grab
slot->val anymore; there are new API functions ExecCopySlotTuple()
and ExecFetchSlotTuple() (the former when you want to make your own copy,
the latter when you don't). These force construction of a real tuple
if the slot is virtual. I also made an ExecCopySlot() convenience routine
for the common case of copying one slot's contents into another slot.
A related API modification is to change tuple receivers (DestReceivers)
to receive a TupleTableSlot instead of separate tuple and tuple descriptor
parameters. This makes it possible to avoid an unnecessary tuple
construction/deconstruction at the final output phase as well.
It also turned out to be useful to make a short-circuit path for
ExecProject when the targetlist is entirely simple Vars. This only
requires copying Datums from lower to upper slots, and we can
implement it that way instead of going through ExecEvalExpr.
Finally, I have made some progress towards making the tuple access
routines consistently use "bool isNull" arrays as null markers, instead
of the char 'n' or ' ' convention that was previously used in some but
not all contexts. I don't think we can retire heap_formtuple or
heap_modifytuple for a long time, if ever, but we can deprecate them
in favor of the parallel new routines with the bool interface.
Comments?
regards, tom lane