Re: WIP: Faster Expression Processing v4 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WIP: Faster Expression Processing v4
Date
Msg-id 20170324182742.3r4qndpugtx6ga3t@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] WIP: Faster Expression Processing v4  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP: Faster Expression Processing v4  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: WIP: Faster Expression Processing v4  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
> Another modest proposal:
> 
> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
> trigger initial tupleslot de-forming.  Certainly we want to have a single
> slot_getsomeattrs call per source slot, but as-is, we need a separate
> traversal over the expression tree just to precompute the max attribute
> number needed.  That can't be good for expression compile speed, and
> it introduces nontrivial bug risks because the code that does that
> is completely decoupled from the code that emits the EEOP_VAR opcodes
> (which are what's really relying on the de-forming to have happened).

Hm.  We had the separate traversal for projections for a long while, and
I don't think there've been a a lot of changes to the extraction of the
last attribute number.  I'm very doubtful that the cost of traversing
the expression twice is meaningful in comparison to the other costs.


> What do you think about a design like this:
> 
> * Drop the FETCHSOME opcodes.
> 
> * Add fields to struct ExprState that will hold the maximum inner,
> outer, and scan attribute numbers needed.
> 
> * ExecInitExpr initializes those fields to zero, and then during
> ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
> 
>     state->last_inner_attno = Max(state->last_inner_attno,
>                                   variable->varattno);
> 
> * ExecInitExprSlots, get_last_attnums_walker, etc all go away.
> 
> * In the startup segment of ExecInterpExpr, add
> 
>     if (state->last_inner_attno > 0)
>         slot_getsomeattrs(innerslot, state->last_inner_attno);
>     if (state->last_outer_attno > 0)
>         slot_getsomeattrs(outerslot, state->last_outer_attno);
>     if (state->last_scan_attno > 0)
>         slot_getsomeattrs(scanslot, state->last_scan_attno);
> 
> This would be a little different from the existing code as far as runtime
> branch-prediction behavior goes, but it's not apparent to me that it'd be
> any worse.

I'd be suprised if it weren't.

I'm not super strongly against this setup, but I fail to see the benefit
of whacking this around.  I've benchmarked the previous/current setup
fairly extensively, and I'd rather not redo that.  In contrast to the
other changes you've talked about, this definitely is in the hot-path...


> Also, for JIT purposes it'd still be entirely possible to compile the
> slot_getsomeattrs calls in-line; you'd just be looking to a different
> part of the ExprState struct to find out what to do.

Yea, we could do that.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pg_serial early wraparound
Next
From: Michael Banck
Date:
Subject: Re: Create replication slot in pg_basebackup if requestedand not yet present