Thread: Making JIT more granular

Making JIT more granular

From

David Rowley

Date:

04 August 2020, 05:01:03

Hi,

At the moment JIT compilation, if enabled, is applied to all
expressions in the entire plan. This can sometimes be a problem as
some expressions may be evaluated lots and warrant being JITted, but
others may only be evaluated just a few times, or even not at all.

This problem tends to become large when table partitioning is involved
as the number of expressions in the plan grows with each partition
present in the plan. Some partitions may have many rows and it can be
useful to JIT expression, but others may have few rows or even no
rows, in which case JIT is a waste of effort.

I recall a few cases where people have complained that JIT was too
slow. One case, in particular, is [1].

It would be nice if JIT was more granular about which parts of the
plan it could be enabled for. So I went and did that in the attached.

The patch basically changes the plan-level consideration of if JIT
should be enabled and to what level into a per-plan-node
consideration. So, instead of considering JIT based on the overall
total_cost of the plan, we just consider it on the plan-node's
total_cost.

I was just planing around with a test case of:

create table listp(a int, b int) partition by list(a);
select 'create table listp'|| x || ' partition of listp for values
in('||x||');' from generate_Series(1,1000) x;
\gexec
insert into listp select 1,x from generate_series(1,100000000) x;
vacuum analyze listp;

explain (analyze, buffers) select count(*) from listp where b < 0;

I get:

master jit=on
JIT:
Functions: 3002
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 141.587 ms, Inlining 11.760 ms, Optimization
6518.664 ms, Emission 3152.266 ms, Total 9824.277 ms
Execution Time: 12588.292 ms
(2013 rows)

master jit=off
Execution Time: 3672.391 ms

patched jit=on
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.675 ms, Inlining 3.322 ms, Optimization 10.766
ms, Emission 5.892 ms, Total 20.655 ms
Execution Time: 2754.160 ms

This explain format will need further work as each of those flags is
now per plan node rather than on the plan as a whole. I considered
just making the true/false a counter to count the number of functions,
e.g Inlined: 5 Optimized: 5 etc.

I understand from [2] that Andres has WIP code to improve the
performance of JIT compilation. That's really great, but I also
believe that no matter how fast we make it, it's going to be a waste
of effort unless the expressions are evaluated enough times for the
cheaper evaluations to pay off the compilation costs. It'll never be a
win when we evaluate certain expressions zero times. What Andres has
should allow us to drop the default jit costs.

Happy to hear people's thoughts on this.

David

[1] https://www.postgresql.org/message-id/7736C40E-6DB5-4E7A-8FE3-4B2AB8E22793@elevated-dev.com
[2] https://www.postgresql.org/message-id/20200728212806.tu5ebmdbmfrvhoao@alap3.anarazel.de

Attachment

granular_jit_v1.patch

Re: Making JIT more granular

From

David Rowley

Date:

26 April 2022, 08:24:02

(This is an old thread. See [1] if you're missing the original email.)

On Tue, 4 Aug 2020 at 14:01, David Rowley <dgrowleyml@gmail.com> wrote:
> At the moment JIT compilation, if enabled, is applied to all
> expressions in the entire plan.  This can sometimes be a problem as
> some expressions may be evaluated lots and warrant being JITted, but
> others may only be evaluated just a few times, or even not at all.
>
> This problem tends to become large when table partitioning is involved
> as the number of expressions in the plan grows with each partition
> present in the plan.  Some partitions may have many rows and it can be
> useful to JIT expression, but others may have few rows or even no
> rows, in which case JIT is a waste of effort.

This patch recently came up again in [2], where Magnus proposed we add
a new GUC [3] to warn users when JIT compilation takes longer than the
specified fraction of execution time. Over there I mentioned that I
think it might be better to have a go at making the JIT costing better
so that it's more aligned to the amount of JITing work there is to do
rather than the total cost of the plan without any consideration about
how much there is to JIT compile.

In [4], Andres reminded me that I need to account for the number of
times a given plan is (re)scanned rather than just the total_cost of
the Plan node. There followed some discussion about how that might be
done.

I've loosely implemented this in the attached patch. In order to get
the information about the expected number of "loops" a given Plan node
will be subject to, I've modified create_plan() so that it passes this
value down recursively while creating the plan.  Nodes such as Nested
Loop multiply the "est_calls" by the number of outer rows. For nodes
such as Material, I've made the estimated calls = 1.0.  Memoize must
take into account the expected cache hit ratio, which I've had to
record as part of MemoizePath so that create_plan knows about that.
Altogether, this is a fair bit of churn for createplan.c, and it's
still only part of the way there.  When planning subplans, we do
create_plan() right away and since we plan subplans before the outer
plans, we've no idea how many times the subplan will be rescanned. So
to make this work fully I think we'd need to modify the planner so
that we delay the create_plan() for subplans until sometime after
we've planned the outer query.

The reason that I'm posting about this now is mostly because I did say
I'd come back to this patch for v16 and I'm also feeling bad that I
-1'd Magnus' patch, which likely resulted in making zero forward
progress in improving JIT and it's costing situation for v15.

The reason I've not completed this patch to fix the deficiencies
regarding subplans is that that's quite a bit of work and I don't
really want to do that right now.  We might decide that JIT costing
should work in a completely different way that does not require
estimating how many times a plan node will be rescanned.  I think
there's enough patch here to allow us to test this and then decide if
it's any good or not.

There's also maybe some controversy in the patch. I ended up modifying
EXPLAIN so that it shows loops=N as part of the estimated costs.  I
understand there's likely to be fallout from doing that as there are
various tools around that this would likely break.  I added that for a
couple of reasons; 1) I think it would be tricky to figure out why JIT
was or was not enabled without showing that in EXPLAIN, and; 2) I
needed to display it somewhere for my testing so I could figure out if
I'd done something wrong when calculating the value during
create_plan().

This basically looks like:

postgres=# explain select * from h, h h1, h h2;
                                QUERY PLAN
--------------------------------------------------------------------------
 Nested Loop  (cost=0.00..12512550.00 rows=1000000000 width=12)
   ->  Nested Loop  (cost=0.00..12532.50 rows=1000000 width=8)
         ->  Seq Scan on h  (cost=0.00..15.00 rows=1000 width=4)
         ->  Materialize  (cost=0.00..20.00 rows=1000 width=4 loops=1000)
               ->  Seq Scan on h h1  (cost=0.00..15.00 rows=1000 width=4)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=4 loops=1000000)
         ->  Seq Scan on h h2  (cost=0.00..15.00 rows=1000 width=4)
(7 rows)

Just the same as EXPLAIN ANALYZE, I've coded loops= to only show when
there's more than 1 loop. You can also see that the node below
Materialize is not expected to be scanned multiple times. Technically
it could when a parameter changes, but right now it seems more trouble
than it's worth to go to the trouble of estimating that during
create_plan(). There's also some variation from the expected loops and
the actual regarding parallel workers. In the estimate, this is just
the number of times an average worker is expected to invoke the plan,
whereas the actual "loops" is the sum of each worker's invocations.

The other slight controversy that I can see in the patch is
repurposing the JIT cost GUCs and giving them a completely different
meaning than they had previously. I've left them as-is for now as I
didn't think renaming GUCs would ease the pain that DBAs would have to
endure as a result of this change.

Does anyone have any thoughts about this JIT costing?  Is this an
improvement?  Is there a better way?

David

[1] https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C+ksKFpSdZg=q6sTbtQ-v=aw@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvrEoQ5p61NjDCKVgEWaH0qm1KprYw2-7m8-6ZGGJ8A2Dw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CABUevExR_9ZmkYj-aBvDreDKUinWLBBpORcmTbuPdNb5vGOLtA%40mail.gmail.com
[4] https://www.postgresql.org/message-id/20220329231641.ai3qrzpdo2vqvwix%40alap3.anarazel.de

Attachment

granular_jit_v2.patch

Re: Making JIT more granular

From

Andy Fan

Date:

14 May 2022, 03:35:53

Hi David:

Does anyone have any thoughts about this JIT costing? Is this an
improvement? Is there a better way?

I think this is an improvement. However I'm not sure how much improvement

& effort we want pay for it. I just shared my thoughts to start this discussion.

1. Ideally there is no GUC needed at all. For given a operation, like
Expression execution, tuple deform, if we can know the extra cost
of JIT in compile and the saved cost of JIT in execution, we
can choose JIT automatically. But as for now, it is hard to
say both. and we don't have a GUC to for DBA like jit_compile_cost
/ jit_compile_tuple_deform_cost as well. Looks we have some

long way to go for this and cost is always a headache.

2. You calculate the cost to compare with jit_above_cost as:

plan->total_cost * plan->est_loops.

An alternative way might be to consider the rescan cost like
cost_rescan. This should be closer for a final execution cost.
However since it is hard to set a reasonable jit_above_cost,
so I am feeling the current way is OK as well.

3. At implementation level, I think it would be terrible to add
another parameter like est_loops to every create_xxx_plan

in future, An alternative way may be:

typedef struct
{
int est_calls;
} ExtPlanInfo;

void
copyExtPlanInfo(Plan *targetPlan, ExtPlanInfo ext)
{
targetPlan->est_calls = ext.est_calls;
}

create_xxx_plan(..., ExtPlanInfo extinfo)
{
copyExtPlanInfo(plan, extinfo);
}

By this way, it would be easier to add another parameter
like est_calls easily. Not sure if this is over-engineered.

I have gone through the patches for a while, General it looks

good to me. If we have finalized the design, I can do a final

double check.

At last, I think the patched way should be better than

the current way.

Best Regards

Andy Fan

Re: Making JIT more granular

From

Andy Fan

Date:

16 May 2022, 03:06:07

2. You calculate the cost to compare with jit_above_cost as:

plan->total_cost * plan->est_loops.

An alternative way might be to consider the rescan cost like
cost_rescan. This should be closer for a final execution cost.
However since it is hard to set a reasonable jit_above_cost,
so I am feeling the current way is OK as well.

There are two observers after thinking more about this. a). due to the

rescan cost reason, plan->total_cost * plan->est_loops might be greater

than the whole plan's total_cost. This may cause users to be confused why

this change can make the plan not JITed in the past, but JITed now.

explain analyze select * from t1, t2 where t1.a  = t2.a;                                                  QUERY PLAN                                                  ------------------------------------------------------------------------------------------------------------  Nested Loop  (cost=0.00..154.25 rows=100 width=16) (actual time=0.036..2.618 rows=100 loops=1)    Join Filter: (t1.a = t2.a)    Rows Removed by Join Filter: 9900    ->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=8) (actual time=0.015..0.031 rows=100 loops=1)    ->  Materialize  (cost=0.00..2.50 rows=100 width=8) (actual time=0.000..0.010 rows=100 loops=100)          ->  Seq Scan on t2  (cost=0.00..2.00 rows=100 width=8) (actual time=0.007..0.023 rows=100 loops=1)  Planning Time: 0.299 ms  Execution Time: 2.694 ms (8 rows)

The overall plan's total_cost is 154.25, but the Materialize's JIT cost is 2.5 * 100 = 250. 

b). Since the total_cost for a plan counts all the costs for its children, so if one

child plan is JITed, I think all its parents would JITed. Is this by design? 

         QUERY PLAN
----------------------------
 Sort
   Sort Key: (count(*))
   ->  HashAggregate
         Group Key: a
         ->  Seq Scan on t1

(If Seq Scan is JITed, both HashAggregate & Sort will be JITed.) 

Best Regards

Andy Fan