Hi,
On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
> Another modest proposal:
>
> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
> trigger initial tupleslot de-forming. Certainly we want to have a single
> slot_getsomeattrs call per source slot, but as-is, we need a separate
> traversal over the expression tree just to precompute the max attribute
> number needed. That can't be good for expression compile speed, and
> it introduces nontrivial bug risks because the code that does that
> is completely decoupled from the code that emits the EEOP_VAR opcodes
> (which are what's really relying on the de-forming to have happened).
Hm. We had the separate traversal for projections for a long while, and
I don't think there've been a a lot of changes to the extraction of the
last attribute number. I'm very doubtful that the cost of traversing
the expression twice is meaningful in comparison to the other costs.
> What do you think about a design like this:
>
> * Drop the FETCHSOME opcodes.
>
> * Add fields to struct ExprState that will hold the maximum inner,
> outer, and scan attribute numbers needed.
>
> * ExecInitExpr initializes those fields to zero, and then during
> ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
>
> state->last_inner_attno = Max(state->last_inner_attno,
> variable->varattno);
>
> * ExecInitExprSlots, get_last_attnums_walker, etc all go away.
>
> * In the startup segment of ExecInterpExpr, add
>
> if (state->last_inner_attno > 0)
> slot_getsomeattrs(innerslot, state->last_inner_attno);
> if (state->last_outer_attno > 0)
> slot_getsomeattrs(outerslot, state->last_outer_attno);
> if (state->last_scan_attno > 0)
> slot_getsomeattrs(scanslot, state->last_scan_attno);
>
> This would be a little different from the existing code as far as runtime
> branch-prediction behavior goes, but it's not apparent to me that it'd be
> any worse.
I'd be suprised if it weren't.
I'm not super strongly against this setup, but I fail to see the benefit
of whacking this around. I've benchmarked the previous/current setup
fairly extensively, and I'd rather not redo that. In contrast to the
other changes you've talked about, this definitely is in the hot-path...
> Also, for JIT purposes it'd still be entirely possible to compile the
> slot_getsomeattrs calls in-line; you'd just be looking to a different
> part of the ExprState struct to find out what to do.
Yea, we could do that.
Greetings,
Andres Freund