Another modest proposal:
I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
trigger initial tupleslot de-forming. Certainly we want to have a single
slot_getsomeattrs call per source slot, but as-is, we need a separate
traversal over the expression tree just to precompute the max attribute
number needed. That can't be good for expression compile speed, and
it introduces nontrivial bug risks because the code that does that
is completely decoupled from the code that emits the EEOP_VAR opcodes
(which are what's really relying on the de-forming to have happened).
What do you think about a design like this:
* Drop the FETCHSOME opcodes.
* Add fields to struct ExprState that will hold the maximum inner,
outer, and scan attribute numbers needed.
* ExecInitExpr initializes those fields to zero, and then during
ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
state->last_inner_attno = Max(state->last_inner_attno, variable->varattno);
* ExecInitExprSlots, get_last_attnums_walker, etc all go away.
* In the startup segment of ExecInterpExpr, add
if (state->last_inner_attno > 0) slot_getsomeattrs(innerslot, state->last_inner_attno);if (state->last_outer_attno >
0) slot_getsomeattrs(outerslot, state->last_outer_attno);if (state->last_scan_attno > 0)
slot_getsomeattrs(scanslot,state->last_scan_attno);
This would be a little different from the existing code as far as runtime
branch-prediction behavior goes, but it's not apparent to me that it'd be
any worse. Also, for JIT purposes it'd still be entirely possible to
compile the slot_getsomeattrs calls in-line; you'd just be looking to a
different part of the ExprState struct to find out what to do.
regards, tom lane