Thread: bad JIT decision

bad JIT decision

From

Scott Ribe

Date:

24 July 2020, 20:46:06

I have come across a case where PG 12 with default JIT settings makes a dramatically bad decision. PG11 without JIT,
executesthe query in <1ms, PG12 with JIT takes 7s--and explain analyze attributes all that time to JIT. (The plan is
thesame on both 11 & 12, it's just the JIT.)  

It is a complex query, with joins to subqueries etc; there is a decent amount of data (~50M rows), and around 80
partitions(by date) on the main table. The particular query that I'm testing is intended as a sort of base case, in
thatit queries on a small set (4) of unique ids which will not match any rows, thus the complex bits never get
executed,and this is reflected in the plan, where the innermost section is: 

->  Index Scan using equities_rds_id on equities e0  (cost=0.42..33.74 rows=1 width=37) (actual time=6751.892..6751.892
rows=0loops=1) 
   Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[]))
   Filter: (security_type = 'ETP'::text)
   Rows Removed by Filter: 4

And that is ultimately followed by a couple of sets of 80'ish scans of partitions, which show never executed, pretty
muchas expected since there are no rows left to check. The final bit is: 

JIT:
  Functions: 683
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 86.439 ms, Inlining 21.994 ms, Optimization 3900.318 ms, Emission 2561.409 ms, Total 6570.161 ms

Now I think the query is not so complex that there could possibly be 683 distinct functions. I think this count must be
theresult of a smaller number of functions created per-partition. I can understand how that would make sense, and some
testingin which I added conditions that would restrict the matches to a single partition seem to bear it out (JIT
reports79 functions in that case). 

Given the magnitude of the miss in using JIT here, I am wondering: is it possible that the planner does not properly
takeinto account the cost of JIT'ing a function for multiple partitions? Or is it that the planner doesn't have enough
infoabout the restrictiveness of conditions, and is therefore anticipating running the functions against a great many
rows?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Re: bad JIT decision

From

David Rowley

Date:

24 July 2020, 22:26:05

On Sat, 25 Jul 2020 at 08:46, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> Given the magnitude of the miss in using JIT here, I am wondering: is it possible that the planner does not properly
takeinto account the cost of JIT'ing a function for multiple partitions? Or is it that the planner doesn't have enough
infoabout the restrictiveness of conditions, and is therefore anticipating running the functions against a great many
rows?

It does not really take into account the cost of jitting. If the total
plan cost is above the jit threshold then jit is enabled. If not, then
it's disabled.

There are various levels of jit and various thresholds that can be tweaked, see:

select name,setting from pg_settings where name like '%jit%';

But as far as each threshold goes, you either reach it or you don't.
Maybe that can be made better by considering jit in a more cost-based
way rather than by threshold, that way it might be possible to
consider jit per plan node rather than on the query as a whole. e.g,
if you have 1000 partitions and 999 of them have 1 row and the final
one has 1 billion rows, then it's likely a waste of time to jit
expressions for the 999 partitions.

However, for now, you might just want to try raising various jit
thresholds so that it only is enabled for more expensive plans.

David

Re: bad JIT decision

From

Scott Ribe

Date:

24 July 2020, 22:32:01

> On Jul 24, 2020, at 4:26 PM, David Rowley <dgrowleyml@gmail.com> wrote:
>
> It does not really take into account the cost of jitting.

That is what I was missing.

I read about JIT when 12 was pre-release; in re-reading after my post I see that it does not attempt to estimate JIT
cost.And in thinking about it, I realize that would be next to impossible to anticipate how expensive LLVM optimizstion
wasgoing to be. 

In the case where a set of functions is replicated across partitions, it would be possible to do them once, then
projectthe cost of the copies. Perhaps for PG 14 as better support for the combination of JIT optimization and
highly-partitioneddata ;-)

Re: bad JIT decision

From

Tom Lane

Date:

24 July 2020, 22:37:02

David Rowley <dgrowleyml@gmail.com> writes:
> However, for now, you might just want to try raising various jit
> thresholds so that it only is enabled for more expensive plans.

Yeah.  I'm fairly convinced that the v12 defaults are far too low,
because we are constantly seeing complaints of this sort.

            regards, tom lane

Re: bad JIT decision

From

David Rowley

Date:

24 July 2020, 22:42:56

On Sat, 25 Jul 2020 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > However, for now, you might just want to try raising various jit
> > thresholds so that it only is enabled for more expensive plans.
>
> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> because we are constantly seeing complaints of this sort.

I think plan cost overestimation is a common cause of unwanted jit too.

It would be good to see the EXPLAIN ANALYZE so we knew if that was the
case here.

David

Re: bad JIT decision

From

Scott Ribe

Date:

24 July 2020, 22:46:35

> On Jul 24, 2020, at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> because we are constantly seeing complaints of this sort.

They are certainly too low for our case; not sure if for folks who are not partitioning if they're way too low.

The passive-aggressive approach would really not be good general advice for you, but I'm actually glad that in our case
theywere low enough to get our attention early ;-) 

I think I will disable optimization, because with our partitioning scheme we will commonly see blow ups of optimization
timelike this one. 

The inlining time in this case is still much more than the query, but it is low enough to not be noticed by users, and
Ithink that with different variations of the parameters coming in to the query, that for the slower versions (more
partitionsrequiring actual scans), inlining will help. Slowing down the fastest while speeding up the slower is a trade
offwe can take.

Re: bad JIT decision

From

Andres Freund

Date:

25 July 2020, 00:16:05

Hi,

On 2020-07-24 18:37:02 -0400, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > However, for now, you might just want to try raising various jit
> > thresholds so that it only is enabled for more expensive plans.
> 
> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> because we are constantly seeing complaints of this sort.

I think the issue is more that we need to take into accoutn that the
overhead of JITing scales ~linearly with the number of JITed
expressions. And that's not done right now.  I've had a patch somewhere
that had a prototype implementation of changing the costing to be
#expressions * some_cost, and I think that's a lot more accurate.

Greetings,

Andres Freund

Re: bad JIT decision

From

"Andres Freund"

Date:

25 July 2020, 01:51:23

Hi,

On Fri, Jul 24, 2020, at 15:32, Scott Ribe wrote:
> > On Jul 24, 2020, at 4:26 PM, David Rowley <dgrowleyml@gmail.com> wrote:
> > 
> > It does not really take into account the cost of jitting.
> 
> That is what I was missing.
> 
> I read about JIT when 12 was pre-release; in re-reading after my post I 
> see that it does not attempt to estimate JIT cost. And in thinking 
> about it, I realize that would be next to impossible to anticipate how 
> expensive LLVM optimizstion was going to be.

We certainly can do better than now.

> In the case where a set of functions is replicated across partitions, 
> it would be possible to do them once, then project the cost of the 
> copies. 

Probably not - JITing functions separately is more expensive than doing them once... The bigger benefit there is to
avoidoptimizing functions that are likely to be the same.
 

> Perhaps for PG 14 as better support for the combination of JIT 
> optimization and highly-partitioned data ;-)

If I posted a few patches to test / address some of these issue, could you test them with your schema & querries?

Regards,

Andres

Re: bad JIT decision

From

David Rowley

Date:

25 July 2020, 06:58:39

On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 25 Jul 2020 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > David Rowley <dgrowleyml@gmail.com> writes:
> > > However, for now, you might just want to try raising various jit
> > > thresholds so that it only is enabled for more expensive plans.
> >
> > Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> > because we are constantly seeing complaints of this sort.
>
> I think plan cost overestimation is a common cause of unwanted jit too.
>
> It would be good to see the EXPLAIN ANALYZE so we knew if that was the
> case here.

So Scott did send me the full EXPLAIN ANALYZE for this privately. He
wishes to keep the full output private.

After looking at it, it seems the portion that he pasted above, aka:

->  Index Scan using equities_rds_id on equities e0  (cost=0.42..33.74
rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1)
   Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[]))
   Filter: (security_type = 'ETP'::text)
   Rows Removed by Filter: 4

Is nested at the bottom level join, about 6 joins deep.  The lack of
any row being found results in upper level joins not having to do
anything, and the majority of the plan is (never executed).

David

Re: bad JIT decision

From

Tom Lane

Date:

25 July 2020, 14:17:50

David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:
>> I think plan cost overestimation is a common cause of unwanted jit too.
>> It would be good to see the EXPLAIN ANALYZE so we knew if that was the
>> case here.

> So Scott did send me the full EXPLAIN ANALYZE for this privately. He
> wishes to keep the full output private.

So ... what was the *top* line, ie total cost estimate?

            regards, tom lane

Re: bad JIT decision

From

Tom Lane

Date:

25 July 2020, 14:23:14

Andres Freund <andres@anarazel.de> writes:
> On 2020-07-24 18:37:02 -0400, Tom Lane wrote:
>> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
>> because we are constantly seeing complaints of this sort.

> I think the issue is more that we need to take into accoutn that the
> overhead of JITing scales ~linearly with the number of JITed
> expressions. And that's not done right now.  I've had a patch somewhere
> that had a prototype implementation of changing the costing to be
> #expressions * some_cost, and I think that's a lot more accurate.

Another thing we could try with much less effort is scaling it by the
number of relations in the query.  There's already some code in the
plancache that tries to estimate planning effort that way, IIRC.
Such a scaling would be very legitimate for the cost of compiling
tuple-deconstruction code, and for other expressions it'd kind of
amount to an assumption that the expressions-per-table ratio is
roughly constant.  If you don't like that, maybe some simple
nonlinear growth rule would work.

            regards, tom lane

Re: bad JIT decision

From

Tom Lane

Date:

25 July 2020, 14:54:18

David Rowley <dgrowleyml@gmail.com> writes:
> ... nested at the bottom level join, about 6 joins deep.  The lack of
> any row being found results in upper level joins not having to do
> anything, and the majority of the plan is (never executed).

On re-reading this, that last point struck me forcibly.  If most of
the plan never gets executed, could we avoid compiling it?  That is,
maybe JIT isn't JIT enough, and we should make compilation happen
at first use of an expression not during executor startup.

            regards, tom lane

Re: bad JIT decision

From

David Rowley

Date:

25 July 2020, 23:53:34

On Sun, 26 Jul 2020 at 02:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Sat, 25 Jul 2020 at 10:42, David Rowley <dgrowleyml@gmail.com> wrote:
> >> I think plan cost overestimation is a common cause of unwanted jit too.
> >> It would be good to see the EXPLAIN ANALYZE so we knew if that was the
> >> case here.
>
> > So Scott did send me the full EXPLAIN ANALYZE for this privately. He
> > wishes to keep the full output private.
>
> So ... what was the *top* line, ie total cost estimate?

Hash Right Join  (cost=1200566.17..1461446.31 rows=1651 width=141)
(actual time=5881.944..5881.944 rows=0 loops=1)

So well above the standard jit inline and optimize cost

David

Re: bad JIT decision

From

David Rowley

Date:

26 July 2020, 00:03:42

On Sun, 26 Jul 2020 at 02:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-07-24 18:37:02 -0400, Tom Lane wrote:
> >> Yeah.  I'm fairly convinced that the v12 defaults are far too low,
> >> because we are constantly seeing complaints of this sort.
>
> > I think the issue is more that we need to take into accoutn that the
> > overhead of JITing scales ~linearly with the number of JITed
> > expressions. And that's not done right now.  I've had a patch somewhere
> > that had a prototype implementation of changing the costing to be
> > #expressions * some_cost, and I think that's a lot more accurate.
>
> Another thing we could try with much less effort is scaling it by the
> number of relations in the query.  There's already some code in the
> plancache that tries to estimate planning effort that way, IIRC.
> Such a scaling would be very legitimate for the cost of compiling
> tuple-deconstruction code, and for other expressions it'd kind of
> amount to an assumption that the expressions-per-table ratio is
> roughly constant.  If you don't like that, maybe some simple
> nonlinear growth rule would work.

I had imagined something a bit less all or nothing.  I had thought
that the planner could pretty cheaply choose if jit should occur or
not on a per-Expr level.  For WHERE clause items we know "norm_selec"
and we know what baserestrictinfos come before this RestrictInfo, so
we could estimate the number of executions per item in the WHERE
clause. For Exprs in the targetlist we have the estimated rows from
the RelOptInfo. HAVING clause Exprs will be evaluated a similar number
of times.   The planner could do something along the lines of
assuming, say 1000 * cpu_operator_cost to compile an Expr then assume
that a compiled Expr will be some percentage faster than an evaluated
one and only jit when the Expr is likely to be evaluated enough times
for it to be an overall win.  Optimize and inline would just have
higher thresholds.

David

Re: bad JIT decision

From

David Rowley

Date:

26 July 2020, 00:04:48

On Sun, 26 Jul 2020 at 02:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > ... nested at the bottom level join, about 6 joins deep.  The lack of
> > any row being found results in upper level joins not having to do
> > anything, and the majority of the plan is (never executed).
>
> On re-reading this, that last point struck me forcibly.  If most of
> the plan never gets executed, could we avoid compiling it?  That is,
> maybe JIT isn't JIT enough, and we should make compilation happen
> at first use of an expression not during executor startup.

That's interesting.  But it would introduce an additional per
evaluation cost of checking if we're doing the first execution.

David

Re: bad JIT decision

From

Alvaro Herrera

Date:

27 July 2020, 22:00:43

On 2020-Jul-24, Andres Freund wrote:

> I think the issue is more that we need to take into accoutn that the
> overhead of JITing scales ~linearly with the number of JITed
> expressions. And that's not done right now.  I've had a patch somewhere
> that had a prototype implementation of changing the costing to be
> #expressions * some_cost, and I think that's a lot more accurate.

I don't quite understand why is it that a table with 1000 partitions
means that JIT compiles the thing 1000 times.  Sure, it is possible that
some partitions have a different column layout, but it seems an easy bet
that most cases are going to have identical column layout, and so tuple
deforming can be shared.  (I'm less sure about sharing a compile of an
expression, since the varno would vary. But presumably there's a way to
take the varno as an input value for the compiled expr too?)  Now I
don't actually know how this works so please correct if I misunderstand
it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: bad JIT decision

From

Scott Ribe

Date:

27 July 2020, 22:24:38

> On Jul 27, 2020, at 4:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> I don't quite understand why is it that a table with 1000 partitions
> means that JIT compiles the thing 1000 times.  Sure, it is possible that
> some partitions have a different column layout, but it seems an easy bet
> that most cases are going to have identical column layout, and so tuple
> deforming can be shared.  (I'm less sure about sharing a compile of an
> expression, since the varno would vary. But presumably there's a way to
> take the varno as an input value for the compiled expr too?)  Now I
> don't actually know how this works so please correct if I misunderstand
> it.

I'm guessing it's because of inlining. You could optimize a function that takes parameters, no problem. But what's
happeningis inlining, with parameters, then optimizing.

Re: bad JIT decision

From

Andres Freund

Date:

27 July 2020, 23:00:07

Hi,

On 2020-07-25 10:54:18 -0400, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > ... nested at the bottom level join, about 6 joins deep.  The lack of
> > any row being found results in upper level joins not having to do
> > anything, and the majority of the plan is (never executed).
> 
> On re-reading this, that last point struck me forcibly.  If most of
> the plan never gets executed, could we avoid compiling it?  That is,
> maybe JIT isn't JIT enough, and we should make compilation happen
> at first use of an expression not during executor startup.

That unfortunately has its own downsides, in that there's significant
overhead of emitting code multiple times. I suspect that taking the
cost of all the JIT emissions together into account is the more
promising approach.

Greetings,

Andres Freund

Re: bad JIT decision

From

Alvaro Herrera

Date:

27 July 2020, 23:02:56

On 2020-Jul-27, Scott Ribe wrote:

> > On Jul 27, 2020, at 4:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > 
> > I don't quite understand why is it that a table with 1000 partitions
> > means that JIT compiles the thing 1000 times.  Sure, it is possible that
> > some partitions have a different column layout, but it seems an easy bet
> > that most cases are going to have identical column layout, and so tuple
> > deforming can be shared.  (I'm less sure about sharing a compile of an
> > expression, since the varno would vary. But presumably there's a way to
> > take the varno as an input value for the compiled expr too?)  Now I
> > don't actually know how this works so please correct if I misunderstand
> > it.
> 
> I'm guessing it's because of inlining. You could optimize a function
> that takes parameters, no problem. But what's happening is inlining,
> with parameters, then optimizing.

Are you saying that if you crank jit_inline_above_cost beyond this
query's total cost, the problem goes away?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: bad JIT decision

From

David Rowley

Date:

27 July 2020, 23:54:53

On Tue, 28 Jul 2020 at 11:00, Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-07-25 10:54:18 -0400, Tom Lane wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> > > ... nested at the bottom level join, about 6 joins deep.  The lack of
> > > any row being found results in upper level joins not having to do
> > > anything, and the majority of the plan is (never executed).
> >
> > On re-reading this, that last point struck me forcibly.  If most of
> > the plan never gets executed, could we avoid compiling it?  That is,
> > maybe JIT isn't JIT enough, and we should make compilation happen
> > at first use of an expression not during executor startup.
>
> That unfortunately has its own downsides, in that there's significant
> overhead of emitting code multiple times. I suspect that taking the
> cost of all the JIT emissions together into account is the more
> promising approach.

Is there some reason that we can't consider jitting on a more granular
basis?  To me, it seems wrong to have a jit cost per expression and
demand that the plan cost > #nexprs * jit_expr_cost before we do jit
on anything.  It'll make it pretty hard to predict when jit will occur
and doing things like adding new partitions could suddenly cause jit
to not enable for some query any more.

ISTM a more granular approach would be better. For example, for the
expression we expect to evaluate once, there's likely little point in
jitting it, but for the one on some other relation that has more rows,
where we expect to evaluate it 1 billion times, there's likely good
reason to jit that.  Wouldn't it be better to consider it at the
RangeTblEntry level?

David

Re: bad JIT decision

From

Andres Freund

Date:

28 July 2020, 17:19:13

Hi,

On 2020-07-27 19:02:56 -0400, Alvaro Herrera wrote:
> On 2020-Jul-27, Scott Ribe wrote:
> 
> > > On Jul 27, 2020, at 4:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > > 
> > > I don't quite understand why is it that a table with 1000 partitions
> > > means that JIT compiles the thing 1000 times.  Sure, it is possible that
> > > some partitions have a different column layout, but it seems an easy bet
> > > that most cases are going to have identical column layout, and so tuple
> > > deforming can be shared.  (I'm less sure about sharing a compile of an
> > > expression, since the varno would vary. But presumably there's a way to
> > > take the varno as an input value for the compiled expr too?)  Now I
> > > don't actually know how this works so please correct if I misunderstand
> > > it.
> > 
> > I'm guessing it's because of inlining. You could optimize a function
> > that takes parameters, no problem. But what's happening is inlining,
> > with parameters, then optimizing.

No, that's not what happens. The issue rather is that at execution time
there's simply nothing tying the partitioned parts of the query together
from the executor POV. Each table scan gets its own expressions to
evaluate quals etc. That's not a JIT specific thing, it's general.

Which then means a partitioned query with a projection and a where
clause applying on the partition level has > 2 expressions for each
partiton. And they get a separate ExprState and get emitted separately.

One issue is that we don't take that into account for costing. The other
is the overhead, of course. Even when not JITed, that's a lot of work
that we don't actually need, except we don't know which partitions look
enough like others that we could reuse another expression.

One partial way to address this is to simply add a
LLVMAddMergeFunctionsPass() at the beginning of the optimization
pipeline. In my testing that can quite drastically cut down on
optimization time. But obviously solves the problem only to some degree,
since that's not free.

> Are you saying that if you crank jit_inline_above_cost beyond this
> query's total cost, the problem goes away?

FWIW, you can set the cost to -1 and it'll never inline.

Greetings,

Andres Freund

Re: bad JIT decision

From

Tom Lane

Date:

28 July 2020, 19:47:36

Andres Freund <andres@anarazel.de> writes:
> On 2020-07-27 19:02:56 -0400, Alvaro Herrera wrote:
>>> I don't quite understand why is it that a table with 1000 partitions
>>> means that JIT compiles the thing 1000 times.  Sure, it is possible that
>>> some partitions have a different column layout, but it seems an easy bet
>>> that most cases are going to have identical column layout, and so tuple
>>> deforming can be shared.

> No, that's not what happens. The issue rather is that at execution time
> there's simply nothing tying the partitioned parts of the query together
> from the executor POV. Each table scan gets its own expressions to
> evaluate quals etc. That's not a JIT specific thing, it's general.

I think what Alvaro is imagining is caching the results of compiling
tuple-deforming.  You could hash on the basis of all the parts of the
tupdesc that the deforming compiler cares about, and then share the
compiled code across different relations with similar tupdescs.
That could win for lots-o-partitions cases, and it could win across
successive queries on the same relation, too.

Maybe the same principle could be applied to compiled expressions,
but it's less obvious that you'd get enough matches to win.

            regards, tom lane

Re: bad JIT decision

From

Andres Freund

Date:

28 July 2020, 21:07:48

Hi,

On 2020-07-28 11:54:53 +1200, David Rowley wrote:
> Is there some reason that we can't consider jitting on a more granular
> basis?

There's a substantial "constant" overhead of doing JIT. And that it's
nontrival to determine which parts of the query should be JITed in one
part, and which not.


> To me, it seems wrong to have a jit cost per expression and
> demand that the plan cost > #nexprs * jit_expr_cost before we do jit
> on anything.  It'll make it pretty hard to predict when jit will occur
> and doing things like adding new partitions could suddenly cause jit
> to not enable for some query any more.

I think that's the right answer though:

> ISTM a more granular approach would be better. For example, for the
> expression we expect to evaluate once, there's likely little point in
> jitting it, but for the one on some other relation that has more rows,
> where we expect to evaluate it 1 billion times, there's likely good
> reason to jit that.  Wouldn't it be better to consider it at the
> RangeTblEntry level?

Because this'd still JIT if a query has 10k unconditional partition
accesses with the corresponding accesses, even if they're all just one
row?

(I'm rebasing my tree that tries to reduce the overhead / allow caching
/ increase efficiency to current PG, but it's a fair bit of work)

Greetings,

Andres Freund

Re: bad JIT decision

From

Andres Freund

Date:

28 July 2020, 21:28:06

Hi,

On 2020-07-28 14:07:48 -0700, Andres Freund wrote:
> (I'm rebasing my tree that tries to reduce the overhead / allow caching
> / increase efficiency to current PG, but it's a fair bit of work)

FWIW, I created a demo workload for this, and repro'ed the issue with
that. Those improvements does make a very significant difference:

CREATE FUNCTION exec(text) returns text language plpgsql volatile
AS $f$
    BEGIN
      EXECUTE $1;
      RETURN $1;
    END;
$f$;
CREATE TABLE manypa(category text not null, data text not null) PARTITION BY LIST(category);
SELECT exec('CREATE TABLE manypa_'||g.i||' PARTITION OF manypa FOR VALUES IN('||g.i||')') FROM generate_series(1, 1000)
g(i);
INSERT INTO manypa(category, data) VALUES('1', '1');

EXPLAIN ANALYZE SELECT * FROM manypa WHERE data <> '17' and data <> '15' and data <> '13' AND data <> '11' AND data <>
'9'AND data <> '7' AND data <> '5' AND data <> '3' AND data <> '1';
 

Before:
    Timing: Generation 335.345 ms, Inlining 51.025 ms, Optimization 11967.776 ms, Emission 9201.499 ms, Total 21555.645
ms
    IR size: unoptimized: 9022868 bytes, optimized: 6206368 bytes

After:
    Timing: Generation 261.283 ms, Inlining 30.875 ms, Optimization 1671.969 ms, Emission 18.557 ms, Total 1982.683 ms
    IR size: unoptimized 8776100 bytes, optimized 115868 bytes

That obviously needs to be improved further, but it's already a lot
better. In particular after these changes the generated code could be
cached.


One thing that could make a huge difference here is to be able to
determine whether two expressions and/or tlists are equivalent
cheaply... I know that David has some need for that too.

Greetings,

Andres Freund

Re: bad JIT decision

From

David Rowley

Date:

02 August 2020, 22:21:38

On Wed, 29 Jul 2020 at 09:07, Andres Freund <andres@anarazel.de> wrote:
> On 2020-07-28 11:54:53 +1200, David Rowley wrote:
> > Is there some reason that we can't consider jitting on a more granular
> > basis?
>
> There's a substantial "constant" overhead of doing JIT. And that it's
> nontrival to determine which parts of the query should be JITed in one
> part, and which not.
>
>
> > To me, it seems wrong to have a jit cost per expression and
> > demand that the plan cost > #nexprs * jit_expr_cost before we do jit
> > on anything.  It'll make it pretty hard to predict when jit will occur
> > and doing things like adding new partitions could suddenly cause jit
> > to not enable for some query any more.
>
> I think that's the right answer though:

I'm not quite sure why it would be so hard to do more granularly.

Take this case, for example:

create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
insert into listp select 1,x from generate_Series(1,1000000) x;

The EXPLAIN looks like:

postgres=# explain select * from listp where b < 100;
                                QUERY PLAN
--------------------------------------------------------------------------
 Append  (cost=0.00..16967.51 rows=853 width=8)
   ->  Seq Scan on listp1 listp_1  (cost=0.00..16925.00 rows=100 width=8)
         Filter: (b < 100)
   ->  Seq Scan on listp2 listp_2  (cost=0.00..38.25 rows=753 width=8)
         Filter: (b < 100)
(5 rows)

For now, if the total cost of the plan exceeded the jit threshold,
then we'd JIT all the expressions. If it didn't, we'd compile none of
them.

What we could do instead would just add the jitFlags field into struct
Plan to indicate the JIT flags on a per plan node level and enable it
as we do now based on the total_cost of that plan node rather than at
the top-level of the plan as we do now in standard_planner(). The
jitFlags setting code would be moved to the end of
create_plan_recurse() instead.

In this case, if we had the threshold set to 10000, then we'd JIT for
listp1 but not for listp2. I don't think this would even require a
signature change in the jit_compile_expr() function as we can get
access to the plan node from state->parent->plan to see which jitFlags
are set, if any.

David

Re: bad JIT decision

From

David Rowley

Date:

04 August 2020, 01:33:48

On Wed, 29 Jul 2020 at 09:28, Andres Freund <andres@anarazel.de> wrote:
> FWIW, I created a demo workload for this, and repro'ed the issue with
> that. Those improvements does make a very significant difference:

> Before:
>     Timing: Generation 335.345 ms, Inlining 51.025 ms, Optimization 11967.776 ms, Emission 9201.499 ms, Total
21555.645ms

>     IR size: unoptimized: 9022868 bytes, optimized: 6206368 bytes
>
> After:
>     Timing: Generation 261.283 ms, Inlining 30.875 ms, Optimization 1671.969 ms, Emission 18.557 ms, Total 1982.683
ms
>     IR size: unoptimized 8776100 bytes, optimized 115868 bytes

That's a really impressive speedup.  However, no matter how fast we
make the compilation, it's still most likely to be a waste of time
doing it for plan nodes that are just not that costly.

I just wrote a patch to consider JIT on a per-plan-node basis instead
of globally over the entire plan. I'll post it to -hackers.

With a 1000 partition table where all of the cost is on just 1
partition, running a query that hits all partitions, I see:

Master jit=on:
 JIT:
   Functions: 3002
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 141.587 ms, Inlining 11.760 ms, Optimization
6518.664 ms, Emission 3152.266 ms, Total 9824.277 ms
 Execution Time: 12588.292 ms

Master jit=off:
 Execution Time: 3672.391 ms

Patched jit=on:
 JIT:
   Functions: 5
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.675 ms, Inlining 3.322 ms, Optimization 10.766
ms, Emission 5.892 ms, Total 20.655 ms
 Execution Time: 2754.160 ms

Most likely the EXPLAIN output will need to do something more than
show true/false for the options here, but I didn't want to go to too
much trouble unless this is seen as a good direction to go in.

> That obviously needs to be improved further, but it's already a lot
> better. In particular after these changes the generated code could be
> cached.

That would be a game-changer.

David