Thread: Don't codegen deform code for virtual tuples in expr eval for scan fetch

Don't codegen deform code for virtual tuples in expr eval for scan fetch

From
Soumyadeep Chakraborty
Date:
Hello Hackers,

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.

Attached is a patch 0001 that achieves exactly that by:

1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().

Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.

---
Soumyadeep Chakraborty
Attachment

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Hello,

In my previous patch 0001, the resulting opblock consisted of a single
br instruction to it's successor opblock. Such a block represents
unnecessary overhead. Even though such a block would be optimized
away, what if optimization is not performed (perhaps due to
jit_optimize_above_cost)? Perhaps we could be more aggressive. We
could maybe remove the opblock altogether. However, such a solution
is not without complexity.

Such unlinking is non-trivial, and is one of the things the
simplify-cfg pass takes care of. One solution could be to run this
pass ad-hoc for the evalexpr function. Or we could perform the
unlinking during codegen. Such a solution is presented below.

To unlink the current opblock from
the cfg we have to replace all of its uses. From the current state of
the code, these uses are either:
1. Terminator instruction of opblocks[i - 1]
2. Terminator instruction of some sub-op-block of opblocks[i - 1]. By
sub-op-block, I mean some block that is directly linked from opblock[i
- 1] but not opblocks[i].
3. Terminator instruction of the entry block.

We should replace all of these uses with opblocks[i + 1] and then and
only then can we delete opblocks[i]. My latest patch v2-0001 in my latest
patch set, achieves this.

I guard LLVMReplaceAllUsesWith() with Assert()s to ensure that we
don't accidentally replace non-terminator uses of opblocks[i], should
they be introduced in the future. If these asserts fail in the future,
replacing these non-terminator instructions with undefs constitutes a
commonly adopted solution. Otherwise, we can always fall back to making
opblocks[i] empty with just the unconditional br, as in my previous
patch 0001.

--
Soumyadeep

On Tue, Sep 17, 2019 at 11:54 PM Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote:
Hello Hackers,

This is to address a TODO I found in the JIT expression evaluation
code (opcode =
EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):

* TODO: skip nvalid check if slot is fixed and known to
* be a virtual slot.

Not only should we skip the nvalid check if the tuple is virtual, the
whole basic block should be a no-op. There is no deforming to be done,
JIT (slot_compile_deform) or otherwise (slot_getsomeattrs) for virtual
tuples.

Attached is a patch 0001 that achieves exactly that by:

1. Performing the check at the very beginning of the case.
2. Emitting an unconditional branch to the next opblock if we have a
virtual tuple. We end up with a block with just the single
unconditional branch instruction in this case. This block is optimized
away when we run llvm_optimize_module().

Also enclosed is another patch 0002 that adds on some minor
refactoring of conditionals around whether to JIT deform or not.

---
Soumyadeep Chakraborty
Attachment

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Andres Freund
Date:
Hi,

Thanks for working on this!

On 2019-09-17 23:54:51 -0700, Soumyadeep Chakraborty wrote:
> This is to address a TODO I found in the JIT expression evaluation
> code (opcode =
> EEOP_INNER_FETCHSOME/EEOP_OUTER_FETCHSOME/EEOP_SCAN_FETCHSOME):
> 
> * TODO: skip nvalid check if slot is fixed and known to
> * be a virtual slot.

I now think this isn't actually the right approach. Instead of doing
this optimization just for JIT compilation, we should do it while
generating the ExprState itself. That way we don't just accelerate JITed
programs, but also normal interpreted execution.

IOW, wherever ExecComputeSlotInfo() is called, we should only actually
push the expression step, when ExecComputeSlotInfo does not determine
that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
the same type of slot).

Does that make sense?

Greetings,

Andres Freund



Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Andres Freund
Date:
Hi,

On 2019-09-20 22:19:46 -0700, Soumyadeep Chakraborty wrote:
> In my previous patch 0001, the resulting opblock consisted of a single
> br instruction to it's successor opblock. Such a block represents
> unnecessary overhead. Even though such a block would be optimized
> away, what if optimization is not performed (perhaps due to
> jit_optimize_above_cost)? Perhaps we could be more aggressive. We
> could maybe remove the opblock altogether. However, such a solution
> is not without complexity.

I'm doubtful this is worth the complexity - and not that we already have
plenty other places with zero length blocks.

WRT jit_optimize_above_cost not triggering, I think we need to replace
the "basic, unoptimized" codegen path with one that does cheap
optimizations only. The reason we don't do the full optimizations all
the time is that they're expensive, but there's enough optimizations
that are pretty cheap.  At some point we'll probably need our own
optimization pipeline, but I don't want to maintain that right now
(i.e. if some other people want to help with this aspect, cool)...

See also: https://www.postgresql.org/message-id/20190904152438.pv4vdk3ctuvvnxh3%40alap3.anarazel.de

Greetings,

Andres Freund



Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Hello Andres,

Thank you very much for reviewing my patch!

On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
> IOW, wherever ExecComputeSlotInfo() is called, we should only actually
> push the expression step, when ExecComputeSlotInfo does not determine
> that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
> the same type of slot).
>
> Does that make sense?

That is a great suggestion and I totally agree. I have attached a patch
that achieves this.

--
Soumyadeep
Attachment

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Hi Andres,

Thank you for your insight and the link offered just the context I needed!

On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:
> I'm doubtful this is worth the complexity - and not that we already have
> plenty other places with zero length blocks.

Agreed.

On Wed, Sep 25, 2019 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:
> WRT jit_optimize_above_cost not triggering, I think we need to replace
> the "basic, unoptimized" codegen path with one that does cheap
> optimizations only. The reason we don't do the full optimizations all
> the time is that they're expensive, but there's enough optimizations
> that are pretty cheap.  At some point we'll probably need our own
> optimization pipeline, but I don't want to maintain that right now
> (i.e. if some other people want to help with this aspect, cool)...

I would absolutely love to work on this!

I was thinking the same. Perhaps not only replace the default on the
compile path, but also introduce additional levels. These levels could then
be configured at a granularity lower than the -O0, -O1, .. In other words,
perhaps we could have levels with ad-hoc pass combinations. We could then
maybe have costs associated with each level. I think such a framework
would be ideal.

To summarize this into TODOs:
1. Tune default compilation path optimizations.
2. New costs per optimization level.
3. Capacity to define "levels" with ad-hoc pass combinations that are costing
sensitive.

Is this what you meant by "optimization pipeline"?

--
Soumyadeep

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Andres Freund
Date:
Hi,

On 2019-09-25 22:11:51 -0700, Soumyadeep Chakraborty wrote:
> Thank you very much for reviewing my patch!
> 
> On Wed, Sep 25, 2019 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
> > IOW, wherever ExecComputeSlotInfo() is called, we should only actually
> > push the expression step, when ExecComputeSlotInfo does not determine
> > that a) the slot is virtual, b) and fixed (i.e. guaranteed to always be
> > the same type of slot).
> >
> > Does that make sense?
> 
> That is a great suggestion and I totally agree. I have attached a patch
> that achieves this.

I think as done, this has the slight disadvantage of removing the
fast-path for small interpreted expressions, because that explicitly
matches for seeing the EEOP_*_FETCHSOME ops. Look at execExprInterp.c,
around:
    /*
     * Select fast-path evalfuncs for very simple expressions.  "Starting up"
     * the full interpreter is a measurable overhead for these, and these
     * patterns occur often enough to be worth optimizing.
     */
    if (state->steps_len == 3)
    {

So I think we'd have to add a separate fastpath for virtual slots for
that.

What do you think about the attached?

Greetings,

Andres Freund

Attachment

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Hey,

I completely agree, that was an important consideration.

I had some purely cosmetic suggestions:
1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
2. Extract return value to a bool variable for slightly better
readability.
3. Taking the opportunity to use TTS_IS_VIRTUAL.

v2 of patch set attached. The first two patches are unchanged, the cosmetic
changes are part of v2-0003-Some-minor-cosmetic-changes.patch.

--
Soumyadeep
Attachment

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Andres Freund
Date:
Hi,

On 2019-09-27 23:01:05 -0700, Soumyadeep Chakraborty wrote:
> I completely agree, that was an important consideration.
> 
> I had some purely cosmetic suggestions:
> 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.

How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?


> 2. Extract return value to a bool variable for slightly better
> readability.

To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.


> 3. Taking the opportunity to use TTS_IS_VIRTUAL.

Good point.

- Andres



Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Hi Andres,

I don't feel very strongly about the changes I proposed.

> > I completely agree, that was an important consideration.
> >
> > I had some purely cosmetic suggestions:
> > 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
>
> How does renaming it do so? I feel like the asserts are a good idea
> independent of anything else?

I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.

>
> > 2. Extract return value to a bool variable for slightly better
> > readability.
>
> To me that seems clearly worse. The variable doesn't add anything, but
> needing to track more state.>

Agreed.

--
Soumyadeep

Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Andres Freund
Date:
Hi,

On 2019-09-30 09:14:45 -0700, Soumyadeep Chakraborty wrote:
> I don't feel very strongly about the changes I proposed.
> 
> > > I completely agree, that was an important consideration.
> > >
> > > I had some purely cosmetic suggestions:
> > > 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
> >
> > How does renaming it do so? I feel like the asserts are a good idea
> > independent of anything else?
> 
> I felt that encoding the restriction that the function is meant to be called
> only in the context of fetch operations in the function name itself
> ensured that we don't call it from a non-fetch operation - something the
> asserts within ExecComputeSlotInfo() are guarding against.
> 
> >
> > > 2. Extract return value to a bool variable for slightly better
> > > readability.
> >
> > To me that seems clearly worse. The variable doesn't add anything, but
> > needing to track more state.>
> 
> Agreed.

I pushed this to master. Thanks for your contribution!

Regards,

Andres



Re: Don't codegen deform code for virtual tuples in expr eval forscan fetch

From
Soumyadeep Chakraborty
Date:
Awesome! Thanks so much for all the review! :)

--
Soumyadeep