Thread: JIT compilation per plan node

JIT compilation per plan node

From
Melih Mutlu
Date:
Hi hackers,

After discussing this with David offlist, I decided to reinitiate this discussion that has already been raised and discussed several times in the past. [1] [2]

Currently, if JIT is enabled, the decision for JIT compilation is purely tied to the total cost of the query. The number of expressions to be JIT compiled is not taken into consideration, however the time spent JITing also depends on that number. This may cause the cost of JITing to become too much that it hurts rather than improves anything. 
An example case would be that you have many partitions and run a query that touches one, or only a few, of those partitions. If there is no partition pruning done in planning time, all 1000 partitions are JIT compiled while most will not even be executed. 

Proposed patch (based on the patch from [1]) simply changes consideration of JIT from plan level to per-plan-node level. Instead of depending on the total cost, we decide whether to perform JIT on a node or not by considering only that node's cost. This allows us to only JIT compile plan nodes with high costs.

Here is a small test case to see the issue and the benefit of the patch:

CREATE TABLE listp(a int, b int) PARTITION BY LIST(a);
SELECT 'CREATE TABLE listp'|| x || ' PARTITION OF listp FOR VALUES IN ('||x||');' FROM generate_Series(1,1000) x; \gexec 
INSERT INTO listp SELECT 1,x FROM generate_series(1,10000000) x;

EXPLAIN (VERBOSE, ANALYZE) SELECT COUNT(*) FROM listp WHERE b < 0;

master jit=off:
 Planning Time: 25.113 ms
 Execution Time: 315.896 ms

master jit=on:
  Planning Time: 24.664 ms
 JIT:
   Functions: 9008
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 290.705 ms (Deform 108.274 ms), Inlining 0.000 ms, Optimization 165.991 ms, Emission 3232.775 ms, Total 3689.472 ms
 Execution Time: 1612.817 ms
 
patch jit=on:
 Planning Time: 24.055 ms
 JIT:
   Functions: 17
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.463 ms (Deform 0.232 ms), Inlining 0.000 ms, Optimization 0.766 ms, Emission 11.609 ms, Total 13.837 ms
 Execution Time: 299.721 ms


A bit more on what this patch does:
- It introduces a new context to keep track of the number of estimated calls and if JIT is decided for each node that the context applies.
- The number of estimated calls are especially useful where a node is expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested Loop. Knowing the estimated number of calls for a node allows us to rely on total cost multiplied by the estimated calls instead of only total cost for the node.
- For each node, the planner considers if the node should be JITed. If the cost of the node * the number of estimated calls is greater than jit_above_cost,  it's decided to be JIT compiled. Note that this changes the meaning of jit_above_cost, it's now a threshold for a single plan node and not the whole query. Additionally, this change in JIT consideration is only for JIT compilations. Inlining and optimizations continue to be for the whole query and based on the overall cost of the query.  
- EXPLAIN shows estimated number of "loops" and whether JIT is true or not for the node. For text format, JIT=true/false information is shown only if it's VERBOSE. (no reason to not show this info even if not VERBOSE. Showing for only VERBOSE just requires less changes in tests, so I did this for simplicity at the moment).


There are also some things that I'm not sure of:
- What are other places where a node is likely to be rescanned, thus we need to take estimated calls into account properly? Maybe recursive CTEs?
- This change can make jit_above_cost mean something different. Should we rename it or introduce a new GUC? If it'll be kept as it is now, then it would probably be better to change its default value at least.
- What can we do for inlining and optimization? AFAIU performing those per node may be very costly and not make that much sense.But I'm not sure about how to handle those operations. 
- What about parallel queries? Total cost of the node is divided by the number of workers, which can seem like the cost reduced quite a bit. The patch amplifies the cost by the number of workers (by setting estimated calls to the number of workers) to make it more likely to perform JIT for Gather/Gather Merge nodes. OTOH JIT compilations are performed per worker and this can make workers decide JIT compile when it's not really needed.


I'd appreciate any thought/feedback.

Thanks,
Attachment

Re: JIT compilation per plan node

From
Tomas Vondra
Date:
Hi Melih,

On 1/2/24 20:50, Melih Mutlu wrote:
> Hi hackers,
> 
> After discussing this with David offlist, I decided to reinitiate this
> discussion that has already been raised and discussed several times in the
> past. [1] [2]
> 
> Currently, if JIT is enabled, the decision for JIT compilation is purely
> tied to the total cost of the query. The number of expressions to be JIT
> compiled is not taken into consideration, however the time spent JITing
> also depends on that number. This may cause the cost of JITing to become
> too much that it hurts rather than improves anything.
> An example case would be that you have many partitions and run a query that
> touches one, or only a few, of those partitions. If there is no partition
> pruning done in planning time, all 1000 partitions are JIT compiled while
> most will not even be executed.
> 
> Proposed patch (based on the patch from [1]) simply changes consideration
> of JIT from plan level to per-plan-node level. Instead of depending on the
> total cost, we decide whether to perform JIT on a node or not by
> considering only that node's cost. This allows us to only JIT compile plan
> nodes with high costs.
> 
> Here is a small test case to see the issue and the benefit of the patch:
> 
> CREATE TABLE listp(a int, b int) PARTITION BY LIST(a);
> SELECT 'CREATE TABLE listp'|| x || ' PARTITION OF listp FOR VALUES IN
> ('||x||');' FROM generate_Series(1,1000) x; \gexec
> INSERT INTO listp SELECT 1,x FROM generate_series(1,10000000) x;
> 
> EXPLAIN (VERBOSE, ANALYZE) SELECT COUNT(*) FROM listp WHERE b < 0;
> 
> master jit=off:
>  Planning Time: 25.113 ms
>  Execution Time: 315.896 ms
> 
> master jit=on:
>   Planning Time: 24.664 ms
>  JIT:
>    Functions: 9008
>    Options: Inlining false, Optimization false, Expressions true, Deforming
> true
>    Timing: Generation 290.705 ms (Deform 108.274 ms), Inlining 0.000 ms,
> Optimization 165.991 ms, Emission 3232.775 ms, Total 3689.472 ms
>  Execution Time: 1612.817 ms
> 
> patch jit=on:
>  Planning Time: 24.055 ms
>  JIT:
>    Functions: 17
>    Options: Inlining false, Optimization false, Expressions true, Deforming
> true
>    Timing: Generation 1.463 ms (Deform 0.232 ms), Inlining 0.000 ms,
> Optimization 0.766 ms, Emission 11.609 ms, Total 13.837 ms
>  Execution Time: 299.721 ms
> 

Thanks for the updated / refreshed patch.

I think one of the main challenges this patch faces is that there's a
couple old threads with previous attempts, and the current thread simply
builds on top of them, without explaining stuff fully. But people either
don't realize that, or don't have time to read old threads just in case,
so can't follow some of the decisions :-(

I think it'd be good to maybe try to explain some of the problems and
solutions more thoroughly, or at least point to the relevant places in
the old threads ...

> 
> A bit more on what this patch does:
> - It introduces a new context to keep track of the number of estimated
> calls and if JIT is decided for each node that the context applies.

AFAIK this is an attempt to deal with passing the necessary information
while constructing the plan, which David originally tried [1] doing by
passing est_calls during create_plan ...

I doubt CreatePlanContext is a great way to achieve this. For one, it
breaks the long-standing custom that PlannerInfo is the first parameter,
usually followed by RelOptInfo, etc. CreatePlanContext is added to some
functions (but not all), which makes it ... unpredictable.

FWIW it's not clear to me if/how this solves the problem with early
create_plan() calls for subplans. Or is it still the same?

Wouldn't it be simpler to just build the plan as we do now, and then
have an expression_tree_walker that walks the complete plan top-down,
inspects the nodes, enables JIT where appropriate and so on? That can
have arbitrary context, no problem with that.

Considering we decide JIT pretty late anyway (long after costing and
other stuff that might affect the plan selection), the result should be
exactly the same, without the extensive createplan.c disruption ...

(usual caveat: I haven't tried, maybe there's something that means this
can't work)


> - The number of estimated calls are especially useful where a node is
> expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested
> Loop. Knowing the estimated number of calls for a node allows us to rely on
> total cost multiplied by the estimated calls instead of only total cost for
> the node.

Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...)
more likely to be rescanned compared to other nodes?

> - For each node, the planner considers if the node should be JITed. If the
> cost of the node * the number of estimated calls is greater than
> jit_above_cost,  it's decided to be JIT compiled. Note that this changes
> the meaning of jit_above_cost, it's now a threshold for a single plan node
> and not the whole query. Additionally, this change in JIT consideration is
> only for JIT compilations. Inlining and optimizations continue to be for
> the whole query and based on the overall cost of the query.

It's not clear to me why JIT compilation is decided for each node, while
the inlining/optimization is decided for the plan as a whole. I'm not
familiar with the JIT stuff, so maybe it's obvious to others ...

> - EXPLAIN shows estimated number of "loops" and whether JIT is true or not
> for the node. For text format, JIT=true/false information is shown only if
> it's VERBOSE. (no reason to not show this info even if not VERBOSE. Showing
> for only VERBOSE just requires less changes in tests, so I did this for
> simplicity at the moment).
> 

typo in sgml docs: ovarall

> 
> There are also some things that I'm not sure of:
> - What are other places where a node is likely to be rescanned, thus we
> need to take estimated calls into account properly? Maybe recursive CTEs?

Why would it matter if a node is more/less likely to be rescanned?
Either the node is rescanned in the plan or not, and we have nloops to
know how many rescans to expect.

> - This change can make jit_above_cost mean something different. Should we
> rename it or introduce a new GUC? If it'll be kept as it is now, then it
> would probably be better to change its default value at least.

You mean it changes from per-query to per-node threshold? In general I
think we should not repurpose GUCs (because people are likely to copy
and keep the old config, not realizing it works differently). But I'm
not sure this change is different enough to be an issue. And it's in the
opposite direction than usually causes problems (i.e. it would disable
JIT in cases where it was enabled before).

> - What can we do for inlining and optimization? AFAIU performing those per
> node may be very costly and not make that much sense.But I'm not sure about
> how to handle those operations.

Not sure. I don't think I understand JIT details enough to have a good
opinion on this.

> - What about parallel queries? Total cost of the node is divided by the
> number of workers, which can seem like the cost reduced quite a bit. The
> patch amplifies the cost by the number of workers (by setting estimated
> calls to the number of workers) to make it more likely to perform JIT for
> Gather/Gather Merge nodes. OTOH JIT compilations are performed per worker
> and this can make workers decide JIT compile when it's not really needed.
> 

Using the number of workers as "number of calls" seems wrong to me. Why
shouldn't each worker do the JIT decision on it's own, as if it was the
only worker running (but seeing only it's fraction of the data)? Kinda
as if there were multiple independent backends running "small" queries?


regards


[1]
https://www.postgresql.org/message-id/CAApHDvoq5VhV%3D2euyjgBN2bC8Bds9Dtr0bG7R%3DreeefJWKJRXA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compilation per plan node

From
David Rowley
Date:
On Tue, 20 Feb 2024 at 05:26, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I doubt CreatePlanContext is a great way to achieve this. For one, it
> breaks the long-standing custom that PlannerInfo is the first parameter,
> usually followed by RelOptInfo, etc. CreatePlanContext is added to some
> functions (but not all), which makes it ... unpredictable.

I suggested this to Melih as a way to do this based on what Andy wrote
in [1]. I agree with Andy that it's not great to add est_calls to
every function in createplan.c. I feel that CreatePlanContext is a way
to take the hit of adding a parameter once and hopefully never having
to do it again to this degree.  I wondered if PlannerInfo could be a
field in CreatePlanContext.

> FWIW it's not clear to me if/how this solves the problem with early
> create_plan() calls for subplans. Or is it still the same?
>
> Wouldn't it be simpler to just build the plan as we do now, and then
> have an expression_tree_walker that walks the complete plan top-down,
> inspects the nodes, enables JIT where appropriate and so on? That can
> have arbitrary context, no problem with that.

Why walk the entire plan tree again to do something we could do when
building it in the first place?  Recursively walking trees isn't great
from a performance point of view. It would be nice to avoid this if we
can find some other way to handle subplans.  I do have a few other
reasons up my sleeve that subplan creation should be delayed until
later, so maybe we should fix that to unblock those issues.

> Considering we decide JIT pretty late anyway (long after costing and
> other stuff that might affect the plan selection), the result should be
> exactly the same, without the extensive createplan.c disruption ...
>
> (usual caveat: I haven't tried, maybe there's something that means this
> can't work)

It's not like we can look at the top-node's cost as a pre-check to
skip the recursive step for cheap plans as it's perfectly valid that a
node closer to the root of the plan tree have a lower total cost than
that node's subnodes.  e.g LIMIT 1.

> > - The number of estimated calls are especially useful where a node is
> > expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested
> > Loop. Knowing the estimated number of calls for a node allows us to rely on
> > total cost multiplied by the estimated calls instead of only total cost for
> > the node.
>
> Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...)
> more likely to be rescanned compared to other nodes?

I think Melih is listing nodes that can change the est_calls.  Any
node can be rescanned, but only a subset of nodes can adjust the
number of times they call their subnode vs how many times they
themselves are called.

> > - For each node, the planner considers if the node should be JITed. If the
> > cost of the node * the number of estimated calls is greater than
> > jit_above_cost,  it's decided to be JIT compiled. Note that this changes
> > the meaning of jit_above_cost, it's now a threshold for a single plan node
> > and not the whole query. Additionally, this change in JIT consideration is
> > only for JIT compilations. Inlining and optimizations continue to be for
> > the whole query and based on the overall cost of the query.
>
> It's not clear to me why JIT compilation is decided for each node, while
> the inlining/optimization is decided for the plan as a whole. I'm not
> familiar with the JIT stuff, so maybe it's obvious to others ...

This is a problem with LLVM, IIRC.  The problem is it's a decision
that has to be made for an entire compilation unit and it can't be
decided at the expression level.  This is pretty annoying as it's
pretty hard to decide the best logic to use to enable optimisations
and inlining :-(

I think the best thing we could discuss right now is, is this the best
way to fix the JIT costing problem.  In [2] I did link to a complaint
about the JIT costings. See [3].  The OP there wanted to keep the plan
private, but I did get to see it and described the problem on the
list.

Also, I don't happen to think the decision about JITting per plan node
is perfect as the node's total costs can be high for reasons other
than the cost of evaluating expressions.  Also, the number of times a
given expression is evaluated can vary quite a bit based on when the
expression is evaluated.  For example, a foreign table scan that does
most of the filtering remotely, but has a non-shippable expr that
needs to be evaluated locally. The foreign scan might be very
expensive, especially if lots of filtering is done by a Seq Scan and
not many rows might make it back to the local server to benefit from
JITting the non-shippable expression.

A counter-example is the join condition of a non-parameterized nested
loop.  Those get evaluated  n_outer_rows * n_inner_rows times.

I think the savings JIT gives us on evaluation of expressions is going
to be more closely tied to the number of times an expression is
evaluated than the total cost of the node.  However, it's likely more
complex for optimisations and inlining as I imagine the size and
complexity of the comparison function matters too.

It would be good to all agree on how we're going to fix this problem
exactly before Melih gets in too deep fixing the finer details of the
patch. If anyone isn't convinced enough there's a problem with the JIT
costings then I can see if I can dig up other threads where this is
being complained about.

Does anyone want to disagree with the general idea of making the
compilation decision based on the total cost of the node? Or have a
better idea?

David

[1] https://postgr.es/m/CAKU4AWqqSAi%2B-1ZaFawY300WknH79J9dhx%3DpU5%2BbyAbShjUjCQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/7736C40E-6DB5-4E7A-8FE3-4B2AB8E22793@elevated-dev.com



Re: JIT compilation per plan node

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 20 Feb 2024 at 05:26, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Wouldn't it be simpler to just build the plan as we do now, and then
>> have an expression_tree_walker that walks the complete plan top-down,
>> inspects the nodes, enables JIT where appropriate and so on? That can
>> have arbitrary context, no problem with that.

> Why walk the entire plan tree again to do something we could do when
> building it in the first place?

FWIW, I seriously doubt that an extra walk of the plan tree is even
measurable compared to the number of cycles JIT compilation will
expend if it's called.  So I don't buy your argument here.
We would be better off to do this in a way that's clean and doesn't
add overhead for non-JIT-enabled builds.

            regards, tom lane



Re: JIT compilation per plan node

From
David Rowley
Date:
On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I seriously doubt that an extra walk of the plan tree is even
> measurable compared to the number of cycles JIT compilation will
> expend if it's called.  So I don't buy your argument here.
> We would be better off to do this in a way that's clean and doesn't
> add overhead for non-JIT-enabled builds.

The extra walk of the tree would need to be done for every plan, not
just the ones where we do JIT. I'd rather find a way to not add this
extra plan tree walk, especially since the vast majority of cases on
an average instance won't be doing any JIT.

David



Re: JIT compilation per plan node

From
Tomas Vondra
Date:
On 2/20/24 06:38, David Rowley wrote:
> On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, I seriously doubt that an extra walk of the plan tree is even
>> measurable compared to the number of cycles JIT compilation will
>> expend if it's called.  So I don't buy your argument here.
>> We would be better off to do this in a way that's clean and doesn't
>> add overhead for non-JIT-enabled builds.
> 
> The extra walk of the tree would need to be done for every plan, not
> just the ones where we do JIT. I'd rather find a way to not add this
> extra plan tree walk, especially since the vast majority of cases on
> an average instance won't be doing any JIT.
> 

I believe Tom was talking about non-JIT-enabled-builds, i.e. builds that
either don't support JIT at all, or where jit=off. Those would certainly
not need the extra walk.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compilation per plan node

From
David Rowley
Date:
On Tue, 20 Feb 2024 at 23:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 2/20/24 06:38, David Rowley wrote:
> > On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> FWIW, I seriously doubt that an extra walk of the plan tree is even
> >> measurable compared to the number of cycles JIT compilation will
> >> expend if it's called.  So I don't buy your argument here.
> >> We would be better off to do this in a way that's clean and doesn't
> >> add overhead for non-JIT-enabled builds.
> >
> > The extra walk of the tree would need to be done for every plan, not
> > just the ones where we do JIT. I'd rather find a way to not add this
> > extra plan tree walk, especially since the vast majority of cases on
> > an average instance won't be doing any JIT.
> >
>
> I believe Tom was talking about non-JIT-enabled-builds, i.e. builds that
> either don't support JIT at all, or where jit=off. Those would certainly
> not need the extra walk.

I don't believe so as he talked about the fact that the JIT cycles
would drown out the tree walk.  There are no JIT cycles when the cost
threshold isn't met, but we still incur the cost of walking the plan
tree.

David



Re: JIT compilation per plan node

From
Tomas Vondra
Date:

On 2/20/24 06:14, David Rowley wrote:
> On Tue, 20 Feb 2024 at 05:26, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I doubt CreatePlanContext is a great way to achieve this. For one, it
>> breaks the long-standing custom that PlannerInfo is the first parameter,
>> usually followed by RelOptInfo, etc. CreatePlanContext is added to some
>> functions (but not all), which makes it ... unpredictable.
> 
> I suggested this to Melih as a way to do this based on what Andy wrote
> in [1]. I agree with Andy that it's not great to add est_calls to
> every function in createplan.c. I feel that CreatePlanContext is a way
> to take the hit of adding a parameter once and hopefully never having
> to do it again to this degree.  I wondered if PlannerInfo could be a
> field in CreatePlanContext.
> 

You mean we'd be adding more parameters to CreatePlanContext in the
future? Not sure that's a great idea, there are reasons why we have
separate arguments in function signature and not a single struct.

I think contexts are nice to group attributes with a particular purpose,
not as a replacement for arguments to make function signatures simpler.

>> FWIW it's not clear to me if/how this solves the problem with early
>> create_plan() calls for subplans. Or is it still the same?
>>
>> Wouldn't it be simpler to just build the plan as we do now, and then
>> have an expression_tree_walker that walks the complete plan top-down,
>> inspects the nodes, enables JIT where appropriate and so on? That can
>> have arbitrary context, no problem with that.
> 
> Why walk the entire plan tree again to do something we could do when
> building it in the first place?  Recursively walking trees isn't great
> from a performance point of view. It would be nice to avoid this if we
> can find some other way to handle subplans.  I do have a few other
> reasons up my sleeve that subplan creation should be delayed until
> later, so maybe we should fix that to unblock those issues.
> 
>> Considering we decide JIT pretty late anyway (long after costing and
>> other stuff that might affect the plan selection), the result should be
>> exactly the same, without the extensive createplan.c disruption ...
>>
>> (usual caveat: I haven't tried, maybe there's something that means this
>> can't work)
> 
> It's not like we can look at the top-node's cost as a pre-check to
> skip the recursive step for cheap plans as it's perfectly valid that a
> node closer to the root of the plan tree have a lower total cost than
> that node's subnodes.  e.g LIMIT 1.
> 

I'd argue that's actually a reason to do the precheck, exactly because
of the LIMIT. The fact that some node has high total cost does not
matter if there is LIMIT 1 above it. What matters is which fraction of
the plan we execute, not the total cost.

Imagine you have something like

    -> Limit 1 (cost=0..1 rows=1 ...)
       -> Seqscan (cost=0..100000000 rows=1000000 ...)

I'd argue JIT-ing the seqscan is likely pointless, because on average
we'll execute ~1/1000000 of the scan, and the actual cost will be ~100.

>>> - The number of estimated calls are especially useful where a node is
>>> expected to be rescanned, such as Gather. Gather Merge, Memoize and Nested
>>> Loop. Knowing the estimated number of calls for a node allows us to rely on
>>> total cost multiplied by the estimated calls instead of only total cost for
>>> the node.
>>
>> Not sure I follow. Why would be these nodes (Gather, Gather Merge, ...)
>> more likely to be rescanned compared to other nodes?
> 
> I think Melih is listing nodes that can change the est_calls.  Any
> node can be rescanned, but only a subset of nodes can adjust the
> number of times they call their subnode vs how many times they
> themselves are called.
> 

OK

>>> - For each node, the planner considers if the node should be JITed. If the
>>> cost of the node * the number of estimated calls is greater than
>>> jit_above_cost,  it's decided to be JIT compiled. Note that this changes
>>> the meaning of jit_above_cost, it's now a threshold for a single plan node
>>> and not the whole query. Additionally, this change in JIT consideration is
>>> only for JIT compilations. Inlining and optimizations continue to be for
>>> the whole query and based on the overall cost of the query.
>>
>> It's not clear to me why JIT compilation is decided for each node, while
>> the inlining/optimization is decided for the plan as a whole. I'm not
>> familiar with the JIT stuff, so maybe it's obvious to others ...
> 
> This is a problem with LLVM, IIRC.  The problem is it's a decision
> that has to be made for an entire compilation unit and it can't be
> decided at the expression level.  This is pretty annoying as it's
> pretty hard to decide the best logic to use to enable optimisations
> and inlining :-(
> 
> I think the best thing we could discuss right now is, is this the best
> way to fix the JIT costing problem.  In [2] I did link to a complaint
> about the JIT costings. See [3].  The OP there wanted to keep the plan
> private, but I did get to see it and described the problem on the
> list.
> 
> Also, I don't happen to think the decision about JITting per plan node
> is perfect as the node's total costs can be high for reasons other
> than the cost of evaluating expressions.  Also, the number of times a
> given expression is evaluated can vary quite a bit based on when the
> expression is evaluated.  For example, a foreign table scan that does
> most of the filtering remotely, but has a non-shippable expr that
> needs to be evaluated locally. The foreign scan might be very
> expensive, especially if lots of filtering is done by a Seq Scan and
> not many rows might make it back to the local server to benefit from
> JITting the non-shippable expression.
> 
> A counter-example is the join condition of a non-parameterized nested
> loop.  Those get evaluated  n_outer_rows * n_inner_rows times.
> 
> I think the savings JIT gives us on evaluation of expressions is going
> to be more closely tied to the number of times an expression is
> evaluated than the total cost of the node.  However, it's likely more
> complex for optimisations and inlining as I imagine the size and
> complexity of the comparison function matters too.
> 
> It would be good to all agree on how we're going to fix this problem
> exactly before Melih gets in too deep fixing the finer details of the
> patch. If anyone isn't convinced enough there's a problem with the JIT
> costings then I can see if I can dig up other threads where this is
> being complained about.
> 
> Does anyone want to disagree with the general idea of making the
> compilation decision based on the total cost of the node? Or have a
> better idea?
> 

I certainly agree that the current JIT costing is quite crude, and we've
all seen cases where the decision turns out to not be great. And I think
the plan to make the decisions at the node level makes sense, so +1 to
that in general.

And I think you're right that looking just at the node total cost may
not be sufficient - that we may need a better cost model, considering
how many times an expression is executed and so on. But I think we
should try to do this in smaller steps, meaningful on their own,
otherwise we won't move at all. The two threads linked by Melih are ~4y
old and *nothing* changed since then, AFAIK.

I think it's reasonable to start by moving the decision to the node
level - it's where the JIT happens, anyway. It may not be perfect, but
it seems like a clear improvement. And if we then choose to improve the
"JIT cost model" to address some of the issues you pointed out, surely
that would need to happen at the node level too ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compilation per plan node

From
Robert Haas
Date:
On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I certainly agree that the current JIT costing is quite crude, and we've
> all seen cases where the decision turns out to not be great. And I think
> the plan to make the decisions at the node level makes sense, so +1 to
> that in general.

Seems reasonable to me also.

> And I think you're right that looking just at the node total cost may
> not be sufficient - that we may need a better cost model, considering
> how many times an expression is executed and so on. But I think we
> should try to do this in smaller steps, meaningful on their own,
> otherwise we won't move at all. The two threads linked by Melih are ~4y
> old and *nothing* changed since then, AFAIK.
>
> I think it's reasonable to start by moving the decision to the node
> level - it's where the JIT happens, anyway. It may not be perfect, but
> it seems like a clear improvement. And if we then choose to improve the
> "JIT cost model" to address some of the issues you pointed out, surely
> that would need to happen at the node level too ...

I'm not sure I understand whether you (Tomas) think that this patch is
a good idea or a bad idea as it stands. I read the first of these two
paragraphs to suggest that the patch hasn't really evolved much in the
last few years, perhaps suggesting that if it wasn't good enough to
commit back then, it still isn't now. But the second of these two
paragraphs seems more supportive.

From my own point of view, I definitely agree with David's statement
that what we really want to know is how many times each expression
will be evaluated. If we had that information, or just an estimate, I
think we could make much better decisions in this area. But we don't
have that infrastructure now, and it doesn't seem easy to create, so
it seems to me that what we have to decide now is whether applying a
cost threshold on a per-plan-node basis will produce better or worse
results than what making one decision for the whole plan. David's
provided an example of where it does indeed work better back in
https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
- but could there be enough cases where the opposite happens to make
us think that the patch is overall a bad idea?

I personally find that a bit unlikely, although not impossible. I see
a couple of ways that using the per-node cost can distort things -- it
seems like it will tend to heavily feature JIT for "interior" plan
nodes because the cost of a plan node includes it's children -- and as
was mentioned previously, it doesn't really care whether the node cost
is high because of expression evaluation or something else. But
neither of those things seem like they'd be bad enough to make this a
bad way forward over all. For the patch to lose, it seems like we'd
need a case where the overall plan cost would have been high enough to
trigger JIT pre-patch, but most of the benefit would have come from
relatively low-cost nodes that don't get JITted post-patch. The
easiest way for that to happen is if the planner's estimates are off,
but that's not really an argument against this patch as much as it is
an argument that query planning is hard in general.

A slightly subtler way the patch could lose is if the new threshold is
harder to adjust than the old one. For example, imagine that you have
a query that does a Cartesian join. That makes the cost of the input
nodes rather small compared to the cost of the join node, and it also
means that JITting the inner join child in particular is probably
rather important. But if you set join_above_cost low enough to JIT
that node post-patch, then maybe you'll also JIT a bunch of things
that aren't on the inner side of a nested loop and which might
therefore not really need JIT. Unless I'm missing something, this is a
fairly realistic example of where this patch's approach to costing
could turn out to be painful ... but it's not like the current system
is pain-free either.

I don't really know what's best here, but I'm mildly inclined to
believe that the patch might be a change for the better. I have not
reviewed the implementation and have no comment on whether it's good
or bad from that point of view.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: JIT compilation per plan node

From
David Rowley
Date:
Thanks for chipping in here.

On Fri, 15 Mar 2024 at 08:14, Robert Haas <robertmhaas@gmail.com> wrote:
> A slightly subtler way the patch could lose is if the new threshold is
> harder to adjust than the old one. For example, imagine that you have
> a query that does a Cartesian join. That makes the cost of the input
> nodes rather small compared to the cost of the join node, and it also
> means that JITting the inner join child in particular is probably
> rather important. But if you set join_above_cost low enough to JIT
> that node post-patch, then maybe you'll also JIT a bunch of things
> that aren't on the inner side of a nested loop and which might
> therefore not really need JIT. Unless I'm missing something, this is a
> fairly realistic example of where this patch's approach to costing
> could turn out to be painful ... but it's not like the current system
> is pain-free either.

I think this case would be covered as the cost of the inner side of
the join would be multiplied by the estimated outer-side rows.
Effectively, making this part work is the bulk of the patch as we
currently don't know the estimated number of loops of a node during
create plan.

David



Re: JIT compilation per plan node

From
Tomas Vondra
Date:

On 3/14/24 20:14, Robert Haas wrote:
> On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I certainly agree that the current JIT costing is quite crude, and we've
>> all seen cases where the decision turns out to not be great. And I think
>> the plan to make the decisions at the node level makes sense, so +1 to
>> that in general.
> 
> Seems reasonable to me also.
> 
>> And I think you're right that looking just at the node total cost may
>> not be sufficient - that we may need a better cost model, considering
>> how many times an expression is executed and so on. But I think we
>> should try to do this in smaller steps, meaningful on their own,
>> otherwise we won't move at all. The two threads linked by Melih are ~4y
>> old and *nothing* changed since then, AFAIK.
>>
>> I think it's reasonable to start by moving the decision to the node
>> level - it's where the JIT happens, anyway. It may not be perfect, but
>> it seems like a clear improvement. And if we then choose to improve the
>> "JIT cost model" to address some of the issues you pointed out, surely
>> that would need to happen at the node level too ...
> 
> I'm not sure I understand whether you (Tomas) think that this patch is
> a good idea or a bad idea as it stands. I read the first of these two
> paragraphs to suggest that the patch hasn't really evolved much in the
> last few years, perhaps suggesting that if it wasn't good enough to
> commit back then, it still isn't now. But the second of these two
> paragraphs seems more supportive.
> 

To clarify, I think the patch is a step in the right direction, and a
meaningful improvement. It may not be the perfect solution we imagine
(but who knows how far we are from that), but AFAIK moving these
decisions to the node level is something the ideal solution would need
to do too.

The reference to the 4y old patches was meant to support this patch as
an improvement - perhaps incomplete, but still an improvement. We keep
imagining "perfect solutions" and then end up doing nothing.

I recognize there's a risk we may never get to have the ideal solution
(e.g. because it requires information we don't possess). But I still
think moving the decision to the node level would allow us to do better
decisions compared to just doing it for the query as a whole.

> From my own point of view, I definitely agree with David's statement
> that what we really want to know is how many times each expression
> will be evaluated. If we had that information, or just an estimate, I
> think we could make much better decisions in this area. But we don't
> have that infrastructure now, and it doesn't seem easy to create, so
> it seems to me that what we have to decide now is whether applying a
> cost threshold on a per-plan-node basis will produce better or worse
> results than what making one decision for the whole plan. David's
> provided an example of where it does indeed work better back in
> https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
> - but could there be enough cases where the opposite happens to make
> us think that the patch is overall a bad idea?
> 

Right, this risk or regression is always there, and I'm sure it'd be
possible to construct such cases. But considering how crude the current
costing is, I'd be surprised if this ends up being a net negative.

Also, is the number of executions really the thing we're missing? Surely
we know the number of rows the node is dealing with, so we could use
this (yes, I realize there are issues, but we deal with that when
costing quals too). Isn't it much bigger issue that we have pretty much
no cost model for the actual JIT (compilation/optimization) depending on
how many expressions it deals with?

> I personally find that a bit unlikely, although not impossible. I see
> a couple of ways that using the per-node cost can distort things -- it
> seems like it will tend to heavily feature JIT for "interior" plan
> nodes because the cost of a plan node includes it's children -- and as
> was mentioned previously, it doesn't really care whether the node cost
> is high because of expression evaluation or something else. But
> neither of those things seem like they'd be bad enough to make this a
> bad way forward over all. For the patch to lose, it seems like we'd
> need a case where the overall plan cost would have been high enough to
> trigger JIT pre-patch, but most of the benefit would have come from
> relatively low-cost nodes that don't get JITted post-patch. The
> easiest way for that to happen is if the planner's estimates are off,
> but that's not really an argument against this patch as much as it is
> an argument that query planning is hard in general.
> 
> A slightly subtler way the patch could lose is if the new threshold is
> harder to adjust than the old one. For example, imagine that you have
> a query that does a Cartesian join. That makes the cost of the input
> nodes rather small compared to the cost of the join node, and it also
> means that JITting the inner join child in particular is probably
> rather important. But if you set join_above_cost low enough to JIT
> that node post-patch, then maybe you'll also JIT a bunch of things
> that aren't on the inner side of a nested loop and which might
> therefore not really need JIT. Unless I'm missing something, this is a
> fairly realistic example of where this patch's approach to costing
> could turn out to be painful ... but it's not like the current system
> is pain-free either.
> 
> I don't really know what's best here, but I'm mildly inclined to
> believe that the patch might be a change for the better. I have not
> reviewed the implementation and have no comment on whether it's good
> or bad from that point of view.
> 

I think it would be good to construct a bunch of cases where we think
this approach would misbehave, and see if the current code handles that
any better and/or if there's a way to improve that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compilation per plan node

From
David Rowley
Date:
On Fri, 15 Mar 2024 at 10:13, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> To clarify, I think the patch is a step in the right direction, and a
> meaningful improvement. It may not be the perfect solution we imagine
> (but who knows how far we are from that), but AFAIK moving these
> decisions to the node level is something the ideal solution would need
> to do too.

I got thinking about this patch again while working on [1]. I want to
write this down as I don't quite have time to get fully back into this
right now...

Currently, during execution, ExecCreateExprSetupSteps() traverses the
Node tree of the Expr to figure out the max varattno of for each slot.
That's done so all of the tuple deforming happens at once rather than
incrementally. Figuring out the max varattno is a price we have to pay
for every execution of the query. I think we'd be better off doing
that in the planner.

To do this, I thought that setrefs.c could do this processing in
fix_join_expr / fix_upper_expr and wrap up the expression in a new
Node type that stores the max varattno for each special var type.

This idea is related to this discussion because another thing that
could be stored in the very same struct is the "num_exec" value.   I
feel the number of executions of an ExprState is a better gauge of how
useful JIT will be than the cost of the plan node. Now, looking at
set_join_references(), the execution estimates are not exactly
perfect. For example;

#define NUM_EXEC_QUAL(parentplan)   ((parentplan)->plan_rows * 2.0)

that's not a great estimate for a Nested Loop's joinqual, but we could
easily make efforts to improve those and that could likely be done
independently and concurrently with other work to make JIT more
granular.

The problem with doing this is that there's just a huge amount of code
churn in the executor.  I am keen to come up with a prototype so I can
get a better understanding of if this is a practical solution.  I
don't want to go down that path if it's just me that thinks the number
of times an ExprState is evaluated is a better measure to go on for
JIT vs no JIT than the plan node's total cost.

Does anyone have any thoughts on that idea?

David

[1] https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=kEa4g@mail.gmail.com



Re: JIT compilation per plan node

From
Jelte Fennema-Nio
Date:
On Tue, 20 Feb 2024 at 06:38, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 20 Feb 2024 at 18:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > FWIW, I seriously doubt that an extra walk of the plan tree is even
> > measurable compared to the number of cycles JIT compilation will
> > expend if it's called.  So I don't buy your argument here.
> > We would be better off to do this in a way that's clean and doesn't
> > add overhead for non-JIT-enabled builds.
>
> The extra walk of the tree would need to be done for every plan, not
> just the ones where we do JIT. I'd rather find a way to not add this
> extra plan tree walk, especially since the vast majority of cases on
> an average instance won't be doing any JIT.

I'm not saying I'd prefer the extra walk, but I don't think you'd need
to do this extra walk for all plans. Afaict you could skip the extra
walk when top_plan->total_cost < jit_above_cost. i.e. only doing the
extra walk to determine which exact nodes to JIT for cases where we
currently JIT all nodes. That would limit the extra walk overhead to
cases where we currently already spend significant resources on JITing
stuff.



Re: JIT compilation per plan node

From
David Rowley
Date:
On Tue, 14 May 2024 at 19:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> I'm not saying I'd prefer the extra walk, but I don't think you'd need
> to do this extra walk for all plans. Afaict you could skip the extra
> walk when top_plan->total_cost < jit_above_cost. i.e. only doing the
> extra walk to determine which exact nodes to JIT for cases where we
> currently JIT all nodes. That would limit the extra walk overhead to
> cases where we currently already spend significant resources on JITing
> stuff.

You could do that, but wouldn't it just cause us to sometimes miss
doing JIT for plan nodes that have a total cost above the top node's?
To me, it seems like a shortcut that someone might complain about one
day and fixing it might require removing the short cut, which would
lead to traversing the whole plan tree.

Here's a plan where the total cost of a subnode is greater than the
total cost of the top node:

set max_parallel_workers_per_gather=0;
create table t0 as select a from generate_Series(1,1000000)a;
analyze t0;
explain select * from t0 order by a limit 1;
                               QUERY PLAN
------------------------------------------------------------------------
 Limit  (cost=19480.00..19480.00 rows=1 width=4)
   ->  Sort  (cost=19480.00..21980.00 rows=1000000 width=4)
         Sort Key: a
         ->  Seq Scan on t0  (cost=0.00..14480.00 rows=1000000 width=4)

Anyway, I don't think it's worth talking in detail about specifics
about implementation for the total cost of the node idea when the
whole replacement costing model design is still undecided. It feels
like we're trying to decide what colour to paint the bathroom when we
haven't even come up with a design for the house yet.

I'd be interested to hear your thoughts on using the estimated number
of invocations of the function to drive the JIT flags on a
per-expression level.

David



Re: JIT compilation per plan node

From
Jelte Fennema-Nio
Date:
On Tue, 14 May 2024 at 10:19, David Rowley <dgrowleyml@gmail.com> wrote:
> Here's a plan where the total cost of a subnode is greater than the
> total cost of the top node:

Ah I didn't realize it was possible for that to happen. **reads up on
plan costs**

This actually makes me think that using total_cost of the sub-nodes is
not the enough to determine determining if the node should be JITet.
We wouldn't want to start jitting plans like this, i.e. introducing
all the JIT setup overhead for just a single row:

set max_parallel_workers_per_gather=0;
create table t0 as select a from generate_Series(1,1000000)a;
analyze t0;
explain select a+a*a+a*a+a from t0 limit 1;
                            QUERY PLAN
-----------------------------------------------------
 Limit  (cost=0.00..0.03 rows=1 width=4)
   ->  Seq Scan on t0  (cost=0.00..26980.00 rows=1000000 width=4)

An easy way to work around that issue I guess is by using the minimum
total_cost of all the total_costs from the current sub-node up to the
root node. The current minimum could be passed along as a part of the
context I guess.



Re: JIT compilation per plan node

From
David Rowley
Date:
Thanks for taking a look and thinking about this problem.

On Fri, 20 Dec 2024 at 03:49, Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
> static void
> plan_consider_jit(Plan *plan)
> {
>     plan->jit = false;
>
>     if (jit_enabled)
>     {
>         Cost    total_cost;
>
>         total_cost = plan->total_cost * plan->plan_rows;
>
>         if (total_cost > jit_above_cost)
>             plan->jit = true;
>     }
> }

Unfortunately, it's not that easy. "plan_rows" isn't the estimated
number of times the node will be rescanned, it's the number of rows we
expect it to return. Multiplying that by the plan->total_cost is a
nonsensical value.

I made an attempt in [1] to get the nloop estimate down to where it's
needed so we could multiply the total_cost by that. IIRC, a few people
weren't happy about the churn that caused. I didn't come up with an
alternative.

Generally, I think that going with the number of expected evaluations
of the expression is a more flexible option. The total_cost *
est_nloops of a plan node really isn't a great indication of how much
effort will be spent evaluating the expression. The join filter on a
non-parameterised nested loop is likely the upper extreme end, and
maybe something like a join filter on a parameterized nested loop or a
Bitmap Heap Scan recheck condition is maybe the lower extreme where
the expr eval cost just isn't that well correlated to the total node
cost.

I'll restate we don't need perfect right away, but maybe if we come up
with infrastructure that can be incrementally improved, that would be
the best direction to move towards. My best guess at how that might
work was in the email you replied to. However, I admit to not having
spent much time relooking at it or even thinking about it in the past
few years.

David

[1] https://postgr.es/m/CAApHDvoq5VhV%3D2euyjgBN2bC8Bds9Dtr0bG7R%3DreeefJWKJRXA%40mail.gmail.com