Thread: Query is over 2x slower with jit=on

Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

18 April 2018, 18:35:31

Hi all.

I don't know whether this is expected or not but I'm experiencing over 2x slowdown on a large query in PG-11 with JIT=on.

(query is a prepared statement executed with "explain analyze execute myprepared(arg1, arg2, ..., argn)")

After 10 executions these are the results (the first 5 executed in > 16s, then the plan changed)

With jit=on:

https://explain.depesz.com/s/vYB

Planning Time: 0.336 ms
JIT:
Functions: 716
Generation Time: 78.404 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 43.916 ms
Emission Time: 600.031 ms
Execution Time: 2035.150 ms
(385 rows)

With jit=off:

https://explain.depesz.com/s/X6mA

Planning Time: 0.371 ms
Execution Time: 833.941 ms
(377 rows)

Both are master as of 55d26ff638f063fbccf57843f2c27f9795895a5c

The query largely consists of CTEs with aggregation which are FULL OUTER JOIN'ed.

On v10 the query executes in:

Execution time: 1159.628 ms

So v11 (with jit=off) is about 25% faster (due to parallel hash-join I think), which is nice!

What's the deal with jit making it slower?

Andreas Joseph Krogh

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

18 April 2018, 18:50:55

On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
> With jit=on:
> https://explain.depesz.com/s/vYB
> Planning Time: 0.336 ms 
>  JIT:
>   Functions: 716
>   Generation Time: 78.404 ms
>   Inlining: false
>   Inlining Time: 0.000 ms
>   Optimization: false
>   Optimization Time: 43.916 ms
>   Emission Time: 600.031 ms

Any chance this is a debug LLVM build?

> What's the deal with jit making it slower?

JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.

Greetings,

Andres Freund

Sv: Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

18 April 2018, 19:26:03

På onsdag 18. april 2018 kl. 17:50:55, skrev Andres Freund <andres@anarazel.de>:

On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
> With jit=on:
> https://explain.depesz.com/s/vYB
> Planning Time: 0.336 ms
> JIT:
> Functions: 716
> Generation Time: 78.404 ms
> Inlining: false
> Inlining Time: 0.000 ms
> Optimization: false
> Optimization Time: 43.916 ms
> Emission Time: 600.031 ms

Any chance this is a debug LLVM build?

> What's the deal with jit making it slower?

JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.

PostgreSQL is built with "--enable-debug --with-llvm". LLVM is the one which comes with Ubuntu-17.10.

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Attachment

msg-1556593-3.png

Re: Query is over 2x slower with jit=on

From

Simon Riggs

Date:

18 April 2018, 19:27:41

On 18 April 2018 at 16:50, Andres Freund <andres@anarazel.de> wrote:
> On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
>> With jit=on:
>> https://explain.depesz.com/s/vYB
>> Planning Time: 0.336 ms
>>  JIT:
>>   Functions: 716
>>   Generation Time: 78.404 ms
>>   Inlining: false
>>   Inlining Time: 0.000 ms
>>   Optimization: false
>>   Optimization Time: 43.916 ms
>>   Emission Time: 600.031 ms
>
> Any chance this is a debug LLVM build?
>
>
>> What's the deal with jit making it slower?
>
> JIT has cost, and sometimes it's not beneficial. Here our heuristics
> when to JIT appear to be a bit off. In the parallel world it's worse
> because the JITing is duplicated for parallel workers atm.

Please change the name of the "JIT" parameter to something meaningful
to humans before this gets too far into the wild.

SSL is somewhat understandable because its not a Postgres-private term.

geqo is regrettable and we really don't want any more too
short/abbreviated parameter names.

Think of our EOU if every GUC was a TLA.

Thanks

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Query is over 2x slower with jit=on

From

Chapman Flack

Date:

18 April 2018, 19:50:48

On 04/18/2018 12:27 PM, Simon Riggs wrote:

> Please change the name of the "JIT" parameter to something meaningful
> to humans before this gets too far into the wild.
> 
> SSL is somewhat understandable because its not a Postgres-private term.

JIT is hardly a Postgres-private term. It's a familiar term in a
widespread community, though I might cede the point that the
community in question is more compiler wonks than DBAs.

-Chap

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

18 April 2018, 20:02:38


On April 18, 2018 9:50:48 AM PDT, Chapman Flack <chap@anastigmatix.net> wrote:
>On 04/18/2018 12:27 PM, Simon Riggs wrote:
>
>> Please change the name of the "JIT" parameter to something meaningful
>> to humans before this gets too far into the wild.
>>
>> SSL is somewhat understandable because its not a Postgres-private
>term.
>
>JIT is hardly a Postgres-private term. It's a familiar term in a
>widespread community, though I might cede the point that the
>community in question is more compiler wonks than DBAs.

You're right. There's another thread where we discussed this. Also due to concern of Simon's. Can we not do so
separatelyhere? 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Query is over 2x slower with jit=on

From

Robert Haas

Date:

18 April 2018, 22:16:35

On Wed, Apr 18, 2018 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:
> JIT has cost, and sometimes it's not beneficial. Here our heuristics
> when to JIT appear to be a bit off. In the parallel world it's worse
> because the JITing is duplicated for parallel workers atm.

It seems like you're describing it as if the JIT just didn't produce
gains sufficient to make up for the cost of doing it, but that's not
really the issue here AFAICS.  Here the JIT actually made code that
run slower than the un-JIT-ted code.  That seems like a different sort
of problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

18 April 2018, 22:29:22


On April 18, 2018 12:16:35 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
>On Wed, Apr 18, 2018 at 11:50 AM, Andres Freund <andres@anarazel.de>
>wrote:
>> JIT has cost, and sometimes it's not beneficial. Here our heuristics
>> when to JIT appear to be a bit off. In the parallel world it's worse
>> because the JITing is duplicated for parallel workers atm.
>
>It seems like you're describing it as if the JIT just didn't produce
>gains sufficient to make up for the cost of doing it, but that's not
>really the issue here AFAICS.  Here the JIT actually made code that
>run slower than the un-JIT-ted code.  That seems like a different sort
>of problem.

Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta
addthat somehow, not sure what the best way would be. 
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Query is over 2x slower with jit=on

From

Robert Haas

Date:

19 April 2018, 01:37:30

On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time...
Gottaadd that somehow, not sure what the best way would be.
 

Oh, that does seem like something that should be fixed.  If that's
what is happening here, it's bound to confuse a lot of people.
Probably you need to add some code to
ExecParallelRetrieveInstrumentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sv: Sv: Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

26 April 2018, 14:06:50

På onsdag 18. april 2018 kl. 18:26:03, skrev Andreas Joseph Krogh <andreas@visena.com>:

På onsdag 18. april 2018 kl. 17:50:55, skrev Andres Freund <andres@anarazel.de>:
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
> With jit=on:
> https://explain.depesz.com/s/vYB
> Planning Time: 0.336 ms
> JIT:
> Functions: 716
> Generation Time: 78.404 ms
> Inlining: false
> Inlining Time: 0.000 ms
> Optimization: false
> Optimization Time: 43.916 ms
> Emission Time: 600.031 ms

Any chance this is a debug LLVM build?

> What's the deal with jit making it slower?

JIT has cost, and sometimes it's not beneficial. Here our heuristics
when to JIT appear to be a bit off. In the parallel world it's worse
because the JITing is duplicated for parallel workers atm.

PostgreSQL is built with "--enable-debug --with-llvm". LLVM is the one which comes with Ubuntu-17.10.

Some more info;

Without --enable-debug (seems this didn't impact performance mutch):

First 5 executions (prepared-statement issued from psql)

Planning Time: 47.634 ms
JIT:
Functions: 725
Generation Time: 74.748 ms
Inlining: true
Inlining Time: 90.763 ms
Optimization: true
Optimization Time: 5822.516 ms
Emission Time: 3089.127 ms
Execution Time: 16375.996 ms

After 5 executions (prepared-statement issued from psql)

Planning Time: 0.385 ms
JIT:
Functions: 716
Generation Time: 76.382 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 41.709 ms
Emission Time: 613.074 ms
Execution Time: 2031.830 ms

jit=off:

Planning Time: 0.171 ms
Execution Time: 832.489 ms

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Attachment

msg-1556642-3.png

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

22 August 2018, 19:12:41

Hi,

On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
> On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> > Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time...
Gottaadd that somehow, not sure what the best way would be.

> 
> Oh, that does seem like something that should be fixed.  If that's
> what is happening here, it's bound to confuse a lot of people.
> Probably you need to add some code to
> ExecParallelRetrieveInstrumentation.

I had lost track of this, and we unfortunately hadn't added an open item
back then.  I think we should add it now?

RMT (with me recused), do you think we should accept the new code fixing
this would entail? And thus that this should be an open item? It's
arguably a new feature, although I don't find that a terribly convincing
position.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

"Jonathan S. Katz"

Date:

22 August 2018, 19:36:34

> On Aug 22, 2018, at 12:12 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
>> On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
>>> Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time...
Gottaadd that somehow, not sure what the best way would be. 
>>
>> Oh, that does seem like something that should be fixed.  If that's
>> what is happening here, it's bound to confuse a lot of people.
>> Probably you need to add some code to
>> ExecParallelRetrieveInstrumentation.
>
> I had lost track of this, and we unfortunately hadn't added an open item
> back then.  I think we should add it now?
>
> RMT (with me recused), do you think we should accept the new code fixing
> this would entail? And thus that this should be an open item? It's
> arguably a new feature, although I don't find that a terribly convincing
> position.

Reviewed the earlier discussion. I can argue this both ways.

As it stands right now it seems like we will be defaulting “jit = off” which
means there will only be so many people who will be using jit in PG11,
and as such we could provide the patch for 12.

However, for the people who do enable JIT in PG11, this is a scenario
that would cause the query to perform in an unexpected way that is no
fault of their own, which one would argue is a bug. I have not tried to
reproduce it myself, but from looking at the explain plans from Andreas,
I’m confident I could craft a moderately complex query that could
demonstrate the performance degradation with jit = on.

While adding the code may constitute being a new feature, it is a feature
that would prevent user frustration on something that we are highlighting as a
“major feature” of PG11, even if it’s not enabled by default.

What I would you Andres is how complex will the patch be and how much time
will it take? As such I would +1 it for open items right now unless it seems like
this could take a significant of time to come up with a proper patch, in which
case I would reconsider, but at least with it on open items we would be able
to continuously review.

Thanks,

Jonathan

Attachment

signature.asc

Sv: Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

22 August 2018, 19:39:18

På onsdag 22. august 2018 kl. 18:12:41, skrev Andres Freund <andres@anarazel.de>:

Hi,

On 2018-04-18 18:37:30 -0400, Robert Haas wrote:
> On Wed, Apr 18, 2018 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> > Not convinced that that is true - the issue is more likely that JIT work in workers is counted as execute time... Gotta add that somehow, not sure what the best way would be.
>
> Oh, that does seem like something that should be fixed. If that's
> what is happening here, it's bound to confuse a lot of people.
> Probably you need to add some code to
> ExecParallelRetrieveInstrumentation.

I had lost track of this, and we unfortunately hadn't added an open item
back then. I think we should add it now?

RMT (with me recused), do you think we should accept the new code fixing
this would entail? And thus that this should be an open item? It's
arguably a new feature, although I don't find that a terribly convincing
position.

Greetings,

Andres Freund

Just to be clear; The query really runs slower (wall-clock time), it's not just the timing.

--
Andreas Joseph Krogh

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

22 August 2018, 19:51:55

On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
> Just to be clear; The query really runs slower (wall-clock time), it's not 
> just the timing.

I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that.  But
without the fixed instrumentation that's harder to see...

Greetings,

Andres Freund

Sv: Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

22 August 2018, 20:51:12

På onsdag 22. august 2018 kl. 18:51:55, skrev Andres Freund <andres@anarazel.de>:

On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
> Just to be clear; The query really runs slower (wall-clock time), it's not
> just the timing.

I bet it's not actually running slower, it "just" takes longer to start
up due to the JITing in each worker. I suspect what we should do is to
multiple the cost limits by the number of workers, to model that. But
without the fixed instrumentation that's harder to see...

Well, yes, that might be. By "runs" I meant from me hitting ENTER in psql to the time the query finishes...

I thought JITing of prepared queries happended once (in "prepare") so it didn't have to do the JITing every time the query is executed. Isn't the previously generated bytecode usable for subsequent queries?

--
Andreas Joseph Krogh

Re: Query is over 2x slower with jit=on

From

Pierre Ducroquet

Date:

22 August 2018, 21:48:48

On Wednesday, August 22, 2018 6:51:55 PM CEST Andres Freund wrote:
> On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
> > Just to be clear; The query really runs slower (wall-clock time), it's not
> > just the timing.
> 
> I bet it's not actually running slower, it "just" takes longer to start
> up due to the JITing in each worker. I suspect what we should do is to
> multiple the cost limits by the number of workers, to model that.  But
> without the fixed instrumentation that's harder to see...

It depends on the query. It has been shown in other threads that query can 
indeed take longer to run because of JITing : if the cost is too low to fire 
LLVM optimizer, the generated code can be so bad it will be slower than the 
non-JIT executor.
Cf for instance a previous discussion here :
http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html

I think it would be interesting to try the query from this thread with a patch 
forcing the LLVM codegen to O1 (I found no PassManager there to play with, it 
seems to be an off/on/extreme switch ; patch 0001-LLVM-Use-the-O1-CodeGen-
level.patch from thread mentioned above).

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

22 August 2018, 21:52:05

On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
> I thought JITing of prepared queries happended once (in "prepare")

No, it happens when the first JITed function is executed.

>  so it didn't have to do the JITing every time the query is
> executed. Isn't the previously generated bytecode usable for
> subsequent queries?

No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

22 August 2018, 21:58:45

På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de>:

On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
> I thought JITing of prepared queries happended once (in "prepare")

No, it happens when the first JITed function is executed.

> so it didn't have to do the JITing every time the query is
> executed. Isn't the previously generated bytecode usable for
> subsequent queries?

No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.

Greetings,

Andres Freund

Ok, thanks for clarifying.

--
Andreas Joseph Krogh

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

22 August 2018, 22:02:51

On 2018-08-22 20:48:48 +0200, Pierre Ducroquet wrote:
> On Wednesday, August 22, 2018 6:51:55 PM CEST Andres Freund wrote:
> > On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
> > > Just to be clear; The query really runs slower (wall-clock time), it's not
> > > just the timing.
> > 
> > I bet it's not actually running slower, it "just" takes longer to start
> > up due to the JITing in each worker. I suspect what we should do is to
> > multiple the cost limits by the number of workers, to model that.  But
> > without the fixed instrumentation that's harder to see...
> 
> It depends on the query. It has been shown in other threads that query can 
> indeed take longer to run because of JITing : if the cost is too low to fire 
> LLVM optimizer, the generated code can be so bad it will be slower than the 
> non-JIT executor.

This largely seems to be orthogonal to what I'm talking about.


> Cf for instance a previous discussion here :
http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html

I'd wish people stopped using www.postgresql-archive.org. It's *NOT*
postgresql.org maintained, in fact I do not know who does. It does shows
ads when downloading links, which I'm personally not ok with.


Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

"Jonathan S. Katz"

Date:

23 August 2018, 01:29:58

On Aug 22, 2018, at 2:58 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:

På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de>:
On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
> I thought JITing of prepared queries happended once (in "prepare")

No, it happens when the first JITed function is executed.

> so it didn't have to do the JITing every time the query is
> executed. Isn't the previously generated bytecode usable for
> subsequent queries?

No, not currently. There's some reasons preventing that (primarily that
we currently rely on addresses of certain things not to change during
execution). There's ongoing work to change that, but that's certainly
not going to be ready for v11.

Greetings,

Andres Freund

Ok, thanks for clarifying.

Per earlier note[1] I was able to reproduce this issue with similar results to

Andreas while running 11 Beta 3.

jit = on

https://explain.depesz.com/s/vgzD

Planning Time: 0.921 ms

JIT:

Functions: 193

Generation Time: 121.595 ms

Inlining: false

Inlining Time: 0.000 ms

Optimization: false

Optimization Time: 58.045 ms

Emission Time: 1201.100 ms

Execution Time: 1628.017 ms

jit = off

https://explain.depesz.com/s/AvZM

Planning Time: 1.398 ms

Execution Time: 256.473 ms

I increased the the search range I used in the query by 3x, and got these numbers:

jit=on

Planning Time: 0.931 ms

JIT:

Functions: 184

Generation Time: 126.587 ms

Inlining: true

Inlining Time: 98.865 ms

Optimization: true

Optimization Time: 20518.982 ms

Emission Time: 7270.963 ms

Execution Time: 28772.576 ms

jit=off

Planning Time: 1.527 ms

Execution Time: 959.160 ms

So, I would +1 this for open items.

Jonathan

[1] https://www.postgresql.org/message-id/7F838324-064B-4A24-952C-2800CFBD39D6%40postgresql.org

Attachment

signature.asc

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

23 August 2018, 02:13:01

On 2018-08-22 18:29:58 -0400, Jonathan S. Katz wrote:
> 
> > On Aug 22, 2018, at 2:58 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
> > 
> > På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>>:
> > On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
> > > I thought JITing of prepared queries happended once (in "prepare")
> > 
> > No, it happens when the first JITed function is executed.
> > 
> > 
> > >  so it didn't have to do the JITing every time the query is
> > > executed. Isn't the previously generated bytecode usable for
> > > subsequent queries?
> > 
> > No, not currently. There's some reasons preventing that (primarily that
> > we currently rely on addresses of certain things not to change during
> > execution). There's ongoing work to change that, but that's certainly
> > not going to be ready for v11.
> > 
> > Greetings,
> > 
> > Andres Freund
> > 
> > 
> > Ok, thanks for clarifying.
> 
> Per earlier note[1] I was able to reproduce this issue with similar results to
> Andreas while running 11 Beta 3.
> 
> jit = on
> https://explain.depesz.com/s/vgzD <https://explain.depesz.com/s/vgzD>
> 
> Planning Time: 0.921 ms
> JIT:
>   Functions: 193
>   Generation Time: 121.595 ms
>   Inlining: false
>   Inlining Time: 0.000 ms
>   Optimization: false
>   Optimization Time: 58.045 ms
>   Emission Time: 1201.100 ms
> Execution Time: 1628.017 ms
> 
> jit = off
> https://explain.depesz.com/s/AvZM <https://explain.depesz.com/s/AvZM>
> 
> Planning Time: 1.398 ms
> Execution Time: 256.473 ms
> 
> I increased the the search range I used in the query by 3x, and got these numbers:
> 
> jit=on
> Planning Time: 0.931 ms
> JIT:
>   Functions: 184
>   Generation Time: 126.587 ms
>   Inlining: true
>   Inlining Time: 98.865 ms
>   Optimization: true
>   Optimization Time: 20518.982 ms
>   Emission Time: 7270.963 ms
> Execution Time: 28772.576 ms
> 
> jit=off
> Planning Time: 1.527 ms
> Execution Time: 959.160 ms

For the archives sake: This likely largely is the consequence of
building with LLVM's expensive assertions enabled, as confirmed by
Jonathan over IM.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

"Jonathan S. Katz"

Date:

23 August 2018, 04:00:42

> On Aug 22, 2018, at 7:13 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2018-08-22 18:29:58 -0400, Jonathan S. Katz wrote:
>>
>>> On Aug 22, 2018, at 2:58 PM, Andreas Joseph Krogh <andreas@visena.com> wrote:
>>>
>>> På onsdag 22. august 2018 kl. 20:52:05, skrev Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>>:
>>> On 2018-08-22 19:51:12 +0200, Andreas Joseph Krogh wrote:
>>>> I thought JITing of prepared queries happended once (in "prepare")
>>>
>>> No, it happens when the first JITed function is executed.
>>>
>>>
>>>> so it didn't have to do the JITing every time the query is
>>>> executed. Isn't the previously generated bytecode usable for
>>>> subsequent queries?
>>>
>>> No, not currently. There's some reasons preventing that (primarily that
>>> we currently rely on addresses of certain things not to change during
>>> execution). There's ongoing work to change that, but that's certainly
>>> not going to be ready for v11.
>>>
>>> Greetings,
>>>
>>> Andres Freund
>>>
>>>
>>> Ok, thanks for clarifying.
>>
>> Per earlier note[1] I was able to reproduce this issue with similar results to
>> Andreas while running 11 Beta 3.
>>
>> jit = on
>> https://explain.depesz.com/s/vgzD <https://explain.depesz.com/s/vgzD>
>>
>> Planning Time: 0.921 ms
>> JIT:
>>  Functions: 193
>>  Generation Time: 121.595 ms
>>  Inlining: false
>>  Inlining Time: 0.000 ms
>>  Optimization: false
>>  Optimization Time: 58.045 ms
>>  Emission Time: 1201.100 ms
>> Execution Time: 1628.017 ms
>>
>> jit = off
>> https://explain.depesz.com/s/AvZM <https://explain.depesz.com/s/AvZM>
>>
>> Planning Time: 1.398 ms
>> Execution Time: 256.473 ms
>>
>> I increased the the search range I used in the query by 3x, and got these numbers:
>>
>> jit=on
>> Planning Time: 0.931 ms
>> JIT:
>>  Functions: 184
>>  Generation Time: 126.587 ms
>>  Inlining: true
>>  Inlining Time: 98.865 ms
>>  Optimization: true
>>  Optimization Time: 20518.982 ms
>>  Emission Time: 7270.963 ms
>> Execution Time: 28772.576 ms
>>
>> jit=off
>> Planning Time: 1.527 ms
>> Execution Time: 959.160 ms
>
> For the archives sake: This likely largely is the consequence of
> building with LLVM's expensive assertions enabled, as confirmed by
> Jonathan over IM.

I recompiled with the release version of LLVM. jit=on was still slower,
but the discrepancy was not as bad as the previously reported result:

jit = off
Planning Time: 0.938 ms
Execution Time: 935.599 ms

jit = on
Planning Time: 0.951 ms
JIT:
  Functions: 184
  Generation Time: 17.605 ms
  Inlining: true
  Inlining Time: 20.522 ms
  Optimization: true
  Optimization Time: 1001.034 ms
  Emission Time: 665.319 ms
Execution Time: 2491.560 ms

However, it was still 2x+ slower, so still +1ing for open items.

Jonathan

Attachment

signature.asc

Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

23 August 2018, 10:14:56

På torsdag 23. august 2018 kl. 03:00:42, skrev Jonathan S. Katz <jkatz@postgresql.org>:

> On Aug 22, 2018, at 7:13 PM, Andres Freund <andres@anarazel.de> wrote:
[snip]
> For the archives sake: This likely largely is the consequence of
> building with LLVM's expensive assertions enabled, as confirmed by
> Jonathan over IM.

I recompiled with the release version of LLVM. jit=on was still slower,
but the discrepancy was not as bad as the previously reported result:

jit = off
Planning Time: 0.938 ms
Execution Time: 935.599 ms

jit = on
Planning Time: 0.951 ms
JIT:
Functions: 184
Generation Time: 17.605 ms
Inlining: true
Inlining Time: 20.522 ms
Optimization: true
Optimization Time: 1001.034 ms
Emission Time: 665.319 ms
Execution Time: 2491.560 ms

However, it was still 2x+ slower, so still +1ing for open items.

I compiled with whatever switches LLVM that comes with Ubuntu 18.04 is built with, and without debugging or assertions.

--
Andreas Joseph Krogh

Sv: Re: Query is over 2x slower with jit=on

From

Andreas Joseph Krogh

Date:

08 September 2018, 13:44:14

På torsdag 23. august 2018 kl. 09:14:56, skrev Andreas Joseph Krogh <andreas@visena.com>:

På torsdag 23. august 2018 kl. 03:00:42, skrev Jonathan S. Katz <jkatz@postgresql.org>:

> On Aug 22, 2018, at 7:13 PM, Andres Freund <andres@anarazel.de> wrote:
[snip]
> For the archives sake: This likely largely is the consequence of
> building with LLVM's expensive assertions enabled, as confirmed by
> Jonathan over IM.

I recompiled with the release version of LLVM. jit=on was still slower,
but the discrepancy was not as bad as the previously reported result:

jit = off
Planning Time: 0.938 ms
Execution Time: 935.599 ms

jit = on
Planning Time: 0.951 ms
JIT:
Functions: 184
Generation Time: 17.605 ms
Inlining: true
Inlining Time: 20.522 ms
Optimization: true
Optimization Time: 1001.034 ms
Emission Time: 665.319 ms
Execution Time: 2491.560 ms

However, it was still 2x+ slower, so still +1ing for open items.

I compiled with whatever switches LLVM that comes with Ubuntu 18.04 is built with, and without debugging or assertions.

With 11b3 as of 825f10fbda7a5d8a48d187b8193160e5e44e4011 I'm repeatedly getting these results with jit=on, after 10 runs:

Planning Time: 0.266 ms
JIT:
Functions: 686
Generation Time: 71.895 ms
Inlining: false
Inlining Time: 0.000 ms
Optimization: false
Optimization Time: 39.906 ms
Emission Time: 589.944 ms
Execution Time: 2198.928 ms

Turning jit=off gives this:

Planning Time: 0.180 ms

Execution Time: 938.451 ms

I can provide dataset offlist if anyone wants to look into this.

--
Andreas Joseph Krogh

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

10 September 2018, 13:12:55

On 22 August 2018 at 22:21, Andres Freund <andres@anarazel.de> wrote:
> On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
>> Just to be clear; The query really runs slower (wall-clock time), it's not
>> just the timing.
>
> I bet it's not actually running slower, it "just" takes longer to start
> up due to the JITing in each worker. I suspect what we should do is to
> multiple the cost limits by the number of workers, to model that.  But
> without the fixed instrumentation that's harder to see...

Attached is a patch that accumulates the per-worker jit counters into
the leader process.

Similar to SharedExecutorInstrumentation, I added another shared
memory segment SharedJitInstrumenation, which is used to accumulate
each of the workers' jit instrumentation into the leader's.

Earlier I was thinking we can have a common
SharedExecutorInstrumentation structure that will have both
instrumentions, one usual instrumentation, and the other jit
instrumentation. But then thought that fields in
SharedExecutorInstrumentation are mostly not relevant to
JitInstrumentation. So kept a separate segment in the shared memory.

It may happen that the jit context does not get created at the backend
because there was no jit compile done, but at the workers it got
created. In that case, before retrieving the jit data from the
workers, the estate->es_jit needs to be allocated at the backend. For
that, I had to have a new function jit_create_context(), and for that
a new jit provider callback function create_context(). I used the
exsiting llvm_create_context() for this callback function.

JitContext now has those counters in a separate JitInstrumentation
structure, so that the same structure can be shared by
SharedJitInstrumentation and JitContext.

-----------------

I think we better show per-worker jit info also. The current patch
does not show that. I think it would be easy to continue on the patch
to show per-worker info also. Under the Gather node, we can show
per-worker jit counters. I think this would be useful too, besides the
cumulative figures in the leader process. Comments ?

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

jit_instr_account_workers.patch

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

10 September 2018, 19:09:09

Hi,

On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
> Attached is a patch that accumulates the per-worker jit counters into
> the leader process.

Thanks!

> I think we better show per-worker jit info also. The current patch
> does not show that. I think it would be easy to continue on the patch
> to show per-worker info also. Under the Gather node, we can show
> per-worker jit counters. I think this would be useful too, besides the
> cumulative figures in the leader process. Comments ?

Yes, I think that'd be good. I think we either should print the stats at
the top level as $leader_value, $combined_worker_value, $total_value or
just have the $combined_worker_value at the level where we print other
stats from the worker, too.

>  /*
> + * Add up the workers' JIT instrumentation from dynamic shared memory.
> + */
> +static void
> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
> +                                       SharedJitInstrumentation *shared_jit)
> +{
> +    int            n;
> +    JitContext *jit = planstate->state->es_jit;
> +
> +    /* If the leader hasn't yet created a jit context, allocate one now. */
> +    if (!jit)
> +    {
> +        planstate->state->es_jit = jit =
> +            jit_create_context(planstate->state->es_jit_flags);
> +    }

Wouldn't it be better to move the jit instrumentation to outside of the
context, to avoid having to do this?  Or just cope with not having
instrumentation for the leader in this case?  We'd kinda need to deal
with failure to create one anyway?

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

11 September 2018, 12:20:41

On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>> Attached is a patch that accumulates the per-worker jit counters into
>> the leader process.
>
> Thanks!
>
>
>> I think we better show per-worker jit info also. The current patch
>> does not show that. I think it would be easy to continue on the patch
>> to show per-worker info also. Under the Gather node, we can show
>> per-worker jit counters. I think this would be useful too, besides the
>> cumulative figures in the leader process. Comments ?
>
> Yes, I think that'd be good.
Ok. Will continue on the patch.

> I think we either should print the stats at
> the top level as $leader_value, $combined_worker_value, $total_value or
> just have the $combined_worker_value at the level where we print other
> stats from the worker, too.

Yes, I think we can follow and be consistent with whatever way in
which the other worker stats are printed. Will check.

Note: Since there can be a multiple separate Gather plans under a plan
tree, I think we can show this info for each Gather plan.

>
>
>>  /*
>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>> + */
>> +static void
>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>> +                                                                        SharedJitInstrumentation *shared_jit)
>> +{
>> +     int                     n;
>> +     JitContext *jit = planstate->state->es_jit;
>> +
>> +     /* If the leader hasn't yet created a jit context, allocate one now. */
>> +     if (!jit)
>> +     {
>> +             planstate->state->es_jit = jit =
>> +                     jit_create_context(planstate->state->es_jit_flags);
>> +     }
>
> Wouldn't it be better to move the jit instrumentation to outside of the
> context, to avoid having to do this?  Or just cope with not having
> instrumentation for the leader in this case?  We'd kinda need to deal
> with failure to create one anyway?

Yeah, I think taking out the instrumentation out of the context looks
better. Will work on that.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

14 September 2018, 14:18:24

On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>>> Attached is a patch that accumulates the per-worker jit counters into
>>> the leader process.
>>
>> Thanks!
>>
>>
>>> I think we better show per-worker jit info also. The current patch
>>> does not show that. I think it would be easy to continue on the patch
>>> to show per-worker info also. Under the Gather node, we can show
>>> per-worker jit counters. I think this would be useful too, besides the
>>> cumulative figures in the leader process. Comments ?
>>
>> Yes, I think that'd be good.
> Ok. Will continue on the patch.
>
>> I think we either should print the stats at
>> the top level as $leader_value, $combined_worker_value, $total_value or
>> just have the $combined_worker_value at the level where we print other
>> stats from the worker, too.
>
> Yes, I think we can follow and be consistent with whatever way in
> which the other worker stats are printed. Will check.
>
> Note: Since there can be a multiple separate Gather plans under a plan
> tree, I think we can show this info for each Gather plan.

The attached patch shows per-worker information, besides the
cumulative figure in the end of plan. Attached is the complete output
of an aggregate parallel query on tenk1 table (created using the
regression tests). You can see that the per-worker figures are under
each of the Gather plans.


We can show combined values of all the workers under a Gather plan as
one single value, but I guess if we just show per-worker values, we
don't have to additionally show combined value. Comments ?
Also, I have kept the the per-worker info only for verbose=true.

The Gather plan's leader value is not shown. We also don't show
leader-specific values when we show the usual per-worker
instrumentation values. So I think we are consistent. Actually it is a
bit tedious to collect only leader-specific values. And I don't think
it is worth trying for it. For worker, I am not collecting
per-plan-node info, because we want to show total worker figures, and
not per-plan-node figures. For the normal instrumentations, we want to
show per-node info. For leader-specific instrumentation, we would need
to store per-(Gather-)plan info.

Need to finalize the indentation and the other explain formats. One
option is to keep the per-worker JIT info in a single line, like how
we print the normal per-worker instrumentation :
Worker 1: actual time=20.971..35.975 rows=39991 loops=1


Below is a snippet :

 Aggregate  (cost=18656.00..18656.01 rows=1 width=8) (actual
time=593.917..593.917 rows=1 loops=1)
   Output: count(*)
   ->  Hash Join  (cost=10718.00..18456.00 rows=80000 width=0) (actual
time=496.650..593.883 rows=16 loops=1)
         Inner Unique: true
         Hash Cond: ((a.unique1 = b.unique1) AND (a.two =
(row_number() OVER (?))))
         ->  Gather  (cost=0.00..5918.00 rows=160000 width=8) (actual
time=192.319..223.384 rows=160000 loops=1)
               Output: a.unique1, a.two
               Workers Planned: 4
               Workers Launched: 4
               Jit for Worker : 0
                 Functions: 2
                 Generation Time: 0.273 ms
                 Inlining: true
                 Inlining Time: 43.686 ms
                 Optimization: true
                 Optimization Time: 10.788 ms
                 Emission Time: 8.438 ms
               Jit for Worker : 1
                 Functions: 2
                 Generation Time: 0.293 ms
                 Inlining: true
                 Inlining Time: 72.587 ms
                 Optimization: true
                 Optimization Time: 10.386 ms
                 Emission Time: 8.115 ms
..........
..........
..........
 Planning Time: 0.548 ms
 Jit:
   Functions: 40
   Generation Time: 3.892 ms
   Inlining: true
   Inlining Time: 409.397 ms
   Optimization: true
   Optimization Time: 174.708 ms
   Emission Time: 91.785 ms
 Execution Time: 610.262 ms
(98 rows)


>
>>
>>
>>>  /*
>>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>>> + */
>>> +static void
>>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>>> +                                                                        SharedJitInstrumentation *shared_jit)
>>> +{
>>> +     int                     n;
>>> +     JitContext *jit = planstate->state->es_jit;
>>> +
>>> +     /* If the leader hasn't yet created a jit context, allocate one now. */
>>> +     if (!jit)
>>> +     {
>>> +             planstate->state->es_jit = jit =
>>> +                     jit_create_context(planstate->state->es_jit_flags);
>>> +     }
>>
>> Wouldn't it be better to move the jit instrumentation to outside of the
>> context, to avoid having to do this?  Or just cope with not having
>> instrumentation for the leader in this case?  We'd kinda need to deal
>> with failure to create one anyway?
>
> Yeah, I think taking out the instrumentation out of the context looks
> better. Will work on that.

Not yet done this. Will do now.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

Re: Query is over 2x slower with jit=on

From

Tom Lane

Date:

18 September 2018, 00:50:15

Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
>>> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>>>> I think we better show per-worker jit info also.

Just to throw a contrarian opinion into this: I find the current EXPLAIN
output for JIT to be insanely verbose already.

            regards, tom lane

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

18 September 2018, 07:33:02

On 14 September 2018 at 16:48, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
>>>>  /*
>>>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>>>> + */
>>>> +static void
>>>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>>>> +                                                                        SharedJitInstrumentation *shared_jit)
>>>> +{
>>>> +     int                     n;
>>>> +     JitContext *jit = planstate->state->es_jit;
>>>> +
>>>> +     /* If the leader hasn't yet created a jit context, allocate one now. */
>>>> +     if (!jit)
>>>> +     {
>>>> +             planstate->state->es_jit = jit =
>>>> +                     jit_create_context(planstate->state->es_jit_flags);
>>>> +     }
>>>
>>> Wouldn't it be better to move the jit instrumentation to outside of the
>>> context, to avoid having to do this?  Or just cope with not having
>>> instrumentation for the leader in this case?  We'd kinda need to deal
>>> with failure to create one anyway?
>>
>> Yeah, I think taking out the instrumentation out of the context looks
>> better. Will work on that.
>
> Not yet done this. Will do now.

Attached v3 patch that does the above change.

The jit instrumentation counters are used by the provider code
directly. So I think even if we keep the instrumentation outside of
the context, we should let the resource manager handle deallocation of
the instrumentation, similar to how it deallocates the whole jit
context. So we should allocate the instrumentation in
TopMemoryContext. Otherwise, if the instrumentation is allocated in
per-query context, and if the provider functions accesses it after the
Portal resource is cleaned up then the instrumentation would have been
already de-allocated.

So for this, I thought we better have two separate contexts: base
context (i.e. the usual JitContext) and provider-specific context.
JitContext has new field provider_context pointing to LLVMJitContext.
And LLVMJitContext has a 'base' field pointing to the base context
(i.e. JitContext)

So in ExecParallelRetrieveJitInstrumentation(), where the jit context
is created if it's NULL, I now create just the base context. Later,
on-demand the JitContext->provider_context will be allocated in
llvm_compile().

This way, we can release both the base context and provider context
during ResourceOwnerRelease(), and at the same time continue to be
able to access the jit instrumentation from the provider.

-----------

BTW There is this code in llvm_compile_expr() on HEAD that does not
handle the !parent case correctly :

/* get or create JIT context */
if (parent && parent->state->es_jit)
{
   context = (LLVMJitContext *) parent->state->es_jit;
}
else
{
   context = llvm_create_context(parent->state->es_jit_flags);

   if (parent)
   {
      parent->state->es_jit = &context->base;
   }
}

Here, llvm_create_context() will crash if parent is NULL. Is it that
parent can never be NULL ? I think so, because in jit_compile_expr()
we skip everything if there is no plan state. So probably there should
be an Assert(parent != NULL) rather than handling parent check. Or
else, we need to decide what JIT flags to supply to
llvm_create_context() if we let the provider create the context
without a plan state. For now, in the changes that did to the above
snippet, I have kept a TODO.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

jit_instr_account_workers_v3.patch

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

18 September 2018, 07:52:20

On 18 September 2018 at 03:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
>>>>> I think we better show per-worker jit info also.
>
> Just to throw a contrarian opinion into this: I find the current EXPLAIN
> output for JIT to be insanely verbose already.

One option is to make the jit info displayed on a single line, with
optimize and inlining counters shown only when the corresponding
optimization was attempted. Currently they are shown even if they are
disabled.

In the patch, I have kept the per-worker output verbose-only.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

20 September 2018, 01:40:47

Hi,

On 2018-09-17 17:50:15 -0400, Tom Lane wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> > On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >> On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
> >>> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
> >>>> I think we better show per-worker jit info also.
>
> Just to throw a contrarian opinion into this: I find the current EXPLAIN
> output for JIT to be insanely verbose already.

Hm, it'd have been nice to get that feedback a little bit earlier, I did
inquire...

Currently:

JIT:
  Functions: 2
  Generation Time: 0.680 ms
  Inlining: true
  Inlining Time: 7.591 ms
  Optimization: true
  Optimization Time: 20.522 ms
  Emission Time: 14.607 ms

How about making that:

JIT:
  Functions: 2
  Options: Inlining, Optimization
  Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms

or something similar?

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Tom Lane

Date:

20 September 2018, 06:26:52

Andres Freund <andres@anarazel.de> writes:
> On 2018-09-17 17:50:15 -0400, Tom Lane wrote:
>> Just to throw a contrarian opinion into this: I find the current EXPLAIN
>> output for JIT to be insanely verbose already.

> Hm, it'd have been nice to get that feedback a little bit earlier, I did
> inquire...

> Currently:

> JIT:
>   Functions: 2
>   Generation Time: 0.680 ms
>   Inlining: true
>   Inlining Time: 7.591 ms
>   Optimization: true
>   Optimization Time: 20.522 ms
>   Emission Time: 14.607 ms

Just to clarify, that seems perfectly fine for the "machine readable"
output formats.  I'd just like fewer lines in the "human readable"
output.

> How about making that:

> JIT:
>   Functions: 2
>   Options: Inlining, Optimization
>   Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms

> or something similar?

That's going in the right direction.  Personally I'd make the last line
more like

    Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms, total 43.4 ms

(total at the end seems more natural to me, YMMV).  Also, the "options"
format you suggest here seems a bit too biased towards binary on/off
options --- what happens when there's a three-way option?  So maybe that
line should be like

    Options: inlining on, optimization on

though I'm less sure about that part.

            regards, tom lane

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

20 September 2018, 06:39:22

On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-09-17 17:50:15 -0400, Tom Lane wrote:
> >> Just to throw a contrarian opinion into this: I find the current EXPLAIN
> >> output for JIT to be insanely verbose already.
> 
> > Hm, it'd have been nice to get that feedback a little bit earlier, I did
> > inquire...
> 
> > Currently:
> 
> > JIT:
> >   Functions: 2
> >   Generation Time: 0.680 ms
> >   Inlining: true
> >   Inlining Time: 7.591 ms
> >   Optimization: true
> >   Optimization Time: 20.522 ms
> >   Emission Time: 14.607 ms
> 
> Just to clarify, that seems perfectly fine for the "machine readable"
> output formats.  I'd just like fewer lines in the "human readable"
> output.

Yea, I do think that's a fair complaint.


> > How about making that:
> 
> > JIT:
> >   Functions: 2

FWIW, not that I want to do that now, but at some point it might make
sense to sub-divide this into things like number of "expressions",
"tuple deforming", "plans", ...  Just mentioning that if somebody wants
to comment on reformatting this as well, if we're tinkering anyway.


> >   Options: Inlining, Optimization
> >   Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms
> 
> > or something similar?
> 
> That's going in the right direction.  Personally I'd make the last line
> more like
> 
>     Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms, total 43.4 ms

Yea, that's probably easier to read.


> (total at the end seems more natural to me, YMMV).

I kind of think doing it first is best, because that's usually the first
thing one wants to know.


> Also, the "options" format you suggest here seems a bit too biased
> towards binary on/off options --- what happens when there's a
> three-way option?  So maybe that line should be like
> 
>     Options: inlining on, optimization on
> 
> though I'm less sure about that part.

I'm pretty certain you're right :).  There's already arguments around
making optimization more gradual (akin to O1,2,3).

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Pavel Stehule

Date:

20 September 2018, 06:54:43

čt 20. 9. 2018 v 5:39 odesílatel Andres Freund <andres@anarazel.de> napsal:

On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-09-17 17:50:15 -0400, Tom Lane wrote:
> >> Just to throw a contrarian opinion into this: I find the current EXPLAIN
> >> output for JIT to be insanely verbose already.
>
> > Hm, it'd have been nice to get that feedback a little bit earlier, I did
> > inquire...
>
> > Currently:
>
> > JIT:
> > Functions: 2
> > Generation Time: 0.680 ms
> > Inlining: true
> > Inlining Time: 7.591 ms
> > Optimization: true
> > Optimization Time: 20.522 ms
> > Emission Time: 14.607 ms
>
> Just to clarify, that seems perfectly fine for the "machine readable"
> output formats. I'd just like fewer lines in the "human readable"
> output.

Yea, I do think that's a fair complaint.

> > How about making that:
>
> > JIT:
> > Functions: 2

FWIW, not that I want to do that now, but at some point it might make
sense to sub-divide this into things like number of "expressions",
"tuple deforming", "plans", ... Just mentioning that if somebody wants
to comment on reformatting this as well, if we're tinkering anyway.

> > Options: Inlining, Optimization
> > Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms

Pavel

>
> > or something similar?
>
> That's going in the right direction. Personally I'd make the last line
> more like
>
> Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms, total 43.4 ms

Yea, that's probably easier to read.

> (total at the end seems more natural to me, YMMV).

I kind of think doing it first is best, because that's usually the first
thing one wants to know.

> Also, the "options" format you suggest here seems a bit too biased
> towards binary on/off options --- what happens when there's a
> three-way option? So maybe that line should be like
>
> Options: inlining on, optimization on
>
> though I'm less sure about that part.

I'm pretty certain you're right :). There's already arguments around
making optimization more gradual (akin to O1,2,3).

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Stephen Frost

Date:

20 September 2018, 16:07:21

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > JIT:
> > >   Functions: 2
> > >   Generation Time: 0.680 ms
> > >   Inlining: true
> > >   Inlining Time: 7.591 ms
> > >   Optimization: true
> > >   Optimization Time: 20.522 ms
> > >   Emission Time: 14.607 ms
[...]
> > > How about making that:
> > > JIT:
> > >   Functions: 2
>
> FWIW, not that I want to do that now, but at some point it might make
> sense to sub-divide this into things like number of "expressions",
> "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> to comment on reformatting this as well, if we're tinkering anyway.

I'd actually think we'd maybe want some kind of 'verbose' mode which
shows exactly what got JIT'd and what didn't- one of the questions that
I think people will be asking is "why didn't X get JIT'd?" and I don't
think that's very easy to figure out currently.

> > >   Options: Inlining, Optimization
> > >   Times (Total, Generation, Inlining, Optimization, Emission): 43.4 ms, 0.680 ms, 7.591 ms, 20.522 ms, 14.607 ms
> >
> > > or something similar?
> >
> > That's going in the right direction.  Personally I'd make the last line
> > more like
> >
> >     Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms, total 43.4 ms
>
> Yea, that's probably easier to read.

I tend to agree that it's easier to read but I'm not sure we need to
quite go that far in reducing the number of rows.

> > (total at the end seems more natural to me, YMMV).

I agree with this..

> I kind of think doing it first is best, because that's usually the first
> thing one wants to know.

and this, so what about:

Times:
  generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms
  Total: 43.4 ms

Gets the total out there quick on the left where you're scanning down
while keeping the detailed info above for reviewing after.

> > Also, the "options" format you suggest here seems a bit too biased
> > towards binary on/off options --- what happens when there's a
> > three-way option?  So maybe that line should be like
> >
> >     Options: inlining on, optimization on
> >
> > though I'm less sure about that part.
>
> I'm pretty certain you're right :).  There's already arguments around
> making optimization more gradual (akin to O1,2,3).

That certainly sounds like it'd be very neat to have though I wonder how
well we'll be able to automatically plan out which optimization level to
use when..

Thanks!

Stephen

Attachment

signature.asc

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

20 September 2018, 20:27:01

On 2018-09-20 09:07:21 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > > > JIT:
> > > >   Functions: 2
> > > >   Generation Time: 0.680 ms
> > > >   Inlining: true
> > > >   Inlining Time: 7.591 ms
> > > >   Optimization: true
> > > >   Optimization Time: 20.522 ms
> > > >   Emission Time: 14.607 ms
> [...]
> > > > How about making that:
> > > > JIT:
> > > >   Functions: 2
> > 
> > FWIW, not that I want to do that now, but at some point it might make
> > sense to sub-divide this into things like number of "expressions",
> > "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> > to comment on reformatting this as well, if we're tinkering anyway.
> 
> I'd actually think we'd maybe want some kind of 'verbose' mode which
> shows exactly what got JIT'd and what didn't- one of the questions that
> I think people will be asking is "why didn't X get JIT'd?" and I don't
> think that's very easy to figure out currently.

That seems largely a separate discussion / feature though, right?  I'm
not entirely clear what precisely you mean with "why didn't X get
JIT'd?" - currently that's a whole query decision.

> > I'm pretty certain you're right :).  There's already arguments around
> > making optimization more gradual (akin to O1,2,3).
> 
> That certainly sounds like it'd be very neat to have though I wonder how
> well we'll be able to automatically plan out which optimization level to
> use when..

Well, that's not really different from having to decide whether to use
JIT or not.  I suspect that once / if we get caching and/or background
JIT compilation, we can get a lot more creative around this.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Stephen Frost

Date:

20 September 2018, 20:34:27

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-09-20 09:07:21 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > FWIW, not that I want to do that now, but at some point it might make
> > > sense to sub-divide this into things like number of "expressions",
> > > "tuple deforming", "plans", ...  Just mentioning that if somebody wants
> > > to comment on reformatting this as well, if we're tinkering anyway.
> >
> > I'd actually think we'd maybe want some kind of 'verbose' mode which
> > shows exactly what got JIT'd and what didn't- one of the questions that
> > I think people will be asking is "why didn't X get JIT'd?" and I don't
> > think that's very easy to figure out currently.
>
> That seems largely a separate discussion / feature though, right?  I'm
> not entirely clear what precisely you mean with "why didn't X get
> JIT'd?" - currently that's a whole query decision.

There are things that can't be JIT'd currently and, I'm thinking anyway,
we'd JIT what we are able to and report out what was JIT'd in some
fashion which would inform a user what was able to be JIT'd and what
wasn't.  I'll admit, perhaps this is more of a debugging tool than
something that a end-user would find useful, unless there's possibly
some way that a user could effect what happens.

> > > I'm pretty certain you're right :).  There's already arguments around
> > > making optimization more gradual (akin to O1,2,3).
> >
> > That certainly sounds like it'd be very neat to have though I wonder how
> > well we'll be able to automatically plan out which optimization level to
> > use when..
>
> Well, that's not really different from having to decide whether to use
> JIT or not.  I suspect that once / if we get caching and/or background
> JIT compilation, we can get a lot more creative around this.

yeah, I definitely understand that, cacheing, in particular, I would
expect to play a very large role.

Thanks!

Stephen

Attachment

signature.asc

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

24 September 2018, 21:25:16

Hi,

On 2018-09-19 20:39:22 -0700, Andres Freund wrote:
> On 2018-09-19 23:26:52 -0400, Tom Lane wrote:
> > That's going in the right direction.  Personally I'd make the last line
> > more like
> >
> >     Times: generation 0.680 ms, inlining 7.591 ms, optimization 20.522 ms, emission 14.607 ms, total 43.4 ms
>
> Yea, that's probably easier to read.

I'm wondering about upper-casing the individual times (and options) -
we're largely upper-casing properties, and for json/xml output each
would still be a property. Seems a tad bit more consistent.  I now have:

FORMAT text:
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.298 ms, Inlining 2.250 ms, Optimization 5.797 ms, Emission 5.246 ms, Total 13.591 ms

FORMAT xml:
     <JIT>
       <Functions>2</Functions>
       <Options>
         <Inlining>true</Inlining>
         <Optimization>true</Optimization>
         <Expressions>true</Expressions>
         <Deforming>true</Deforming>
       </Options>
       <Timing>
         <Generation>0.651</Generation>
         <Inlining>2.260</Inlining>
         <Optimization>14.752</Optimization>
         <Emission>7.764</Emission>
         <Total>25.427</Total>
       </Timing>
     </JIT>

FORMAT json:
     "JIT": {
       "Functions": 2,
       "Options": {
         "Inlining": true,
         "Optimization": true,
         "Expressions": true,
         "Deforming": true
       },
       "Timing": {
         "Generation": 0.238,
         "Inlining": 0.807,
         "Optimization": 4.661,
         "Emission": 4.236,
         "Total": 9.942
       }
     },

>
> > (total at the end seems more natural to me, YMMV).
>
> I kind of think doing it first is best, because that's usually the first
> thing one wants to know.
>
>
> > Also, the "options" format you suggest here seems a bit too biased
> > towards binary on/off options --- what happens when there's a
> > three-way option?  So maybe that line should be like
> >
> >     Options: inlining on, optimization on
> >
> > though I'm less sure about that part.

Now that space is less of a concern, I added expressions, and deforming
as additional options - seems reasonable to have all PGJIT_* options
imo.

Btw, I chose true/false rather than on/off, to be consistent with
ExplainPropertyBool - but I've no strong feelings about it.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

25 September 2018, 11:47:49

On 2018-09-18 10:03:02 +0530, Amit Khandekar wrote:
> Attached v3 patch that does the above change.

Attached is a revised version of that patch.  I've changed quite a few
things:
- I've reverted the split of "base" and "provider specific" contexts - I
  don't think it really buys us anything here.

- I've reverted the context creation changes - instead of creating a
  context in the leader just to store instrumentation in the worker,
  there's now a new EState->es_jit_combined_instr.

- That also means worker instrumentation doesn't get folded into the
  leader's instrumentation. This seems good for the future and for
  extensions - it's not actually "linear" time that's spent doing
  JIT in workers (& leader), as all of that work happens in
  parallel. Being able to disentangle that seems important.

- Only allocate / transport JIT instrumentation if instrumentation is
  enabled.

- Smaller cleanups.

Forcing parallelism and JIT to be on, I get:

postgres[20216][1]=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT count(*) FROM pg_class;

Finalize Aggregate  (cost=15.43..15.44 rows=1 width=8) (actual time=183.282..183.282 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=13
  ->  Gather  (cost=15.41..15.42 rows=2 width=8) (actual time=152.835..152.904 rows=2 loops=1)
        Output: (PARTIAL count(*))
        Workers Planned: 2
        Workers Launched: 2
        JIT for worker 0:
          Functions: 2
          Options: Inlining true, Optimization true, Expressions true, Deforming true
          Timing: Generation 0.480 ms, Inlining 78.789 ms, Optimization 5.797 ms, Emission 5.554 ms, Total 90.620 ms
        JIT for worker 1:
          Functions: 2
          Options: Inlining true, Optimization true, Expressions true, Deforming true
          Timing: Generation 0.475 ms, Inlining 78.632 ms, Optimization 5.954 ms, Emission 5.900 ms, Total 90.962 ms
        Buffers: shared hit=13
        ->  Partial Aggregate  (cost=15.41..15.42 rows=1 width=8) (actual time=90.608..90.609 rows=1 loops=2)
              Output: PARTIAL count(*)
              Buffers: shared hit=13
              Worker 0: actual time=90.426..90.426 rows=1 loops=1
                Buffers: shared hit=7
              Worker 1: actual time=90.791..90.792 rows=1 loops=1
                Buffers: shared hit=6
              ->  Parallel Seq Scan on pg_catalog.pg_class  (cost=0.00..14.93 rows=193 width=0) (actual
time=0.006..0.048rows=193 loops=2)
 
                    Output: relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace,
relpages,reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts,
relchecks,relhasoids, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relfo
 
                    Buffers: shared hit=13
                    Worker 0: actual time=0.006..0.047 rows=188 loops=1
                      Buffers: shared hit=7
                    Worker 1: actual time=0.006..0.048 rows=198 loops=1
                      Buffers: shared hit=6
Planning Time: 0.152 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.955 ms, Inlining 157.422 ms, Optimization 11.751 ms, Emission 11.454 ms, Total 181.582 ms
Execution Time: 184.197 ms


Instead of "JIT for worker 0" we could instead do "Worker 0: JIT", but
I'm not sure that's better, given that the JIT group is multi-line
output.

Currently structured formats show this as:
   "JIT": {
     "Worker Number": 1,
     "Functions": 2,
     "Options": {
       "Inlining": true,
       "Optimization": true,
       "Expressions": true,
       "Deforming": true
     },
     "Timing": {
       "Generation": 0.374,
       "Inlining": 70.341,
       "Optimization": 8.006,
       "Emission": 7.670,
       "Total": 86.390
     }
   },


This needs a bit more polish tomorrow, but I'm starting to like where
this is going.  Comments?

Greetings,

Andres Freund

Attachment

0001-Collect-JIT-instrumentation-from-workers.patch

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

25 September 2018, 22:50:34

Hi,

On 2018-09-25 01:47:49 -0700, Andres Freund wrote:
> Forcing parallelism and JIT to be on, I get:
> 
> postgres[20216][1]=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT count(*) FROM pg_class;
> 
> Finalize Aggregate  (cost=15.43..15.44 rows=1 width=8) (actual time=183.282..183.282 rows=1 loops=1)
>   Output: count(*)
>   Buffers: shared hit=13
>   ->  Gather  (cost=15.41..15.42 rows=2 width=8) (actual time=152.835..152.904 rows=2 loops=1)
>         Output: (PARTIAL count(*))
>         Workers Planned: 2
>         Workers Launched: 2
>         JIT for worker 0:
>           Functions: 2
>           Options: Inlining true, Optimization true, Expressions true, Deforming true
>           Timing: Generation 0.480 ms, Inlining 78.789 ms, Optimization 5.797 ms, Emission 5.554 ms, Total 90.620 ms
>         JIT for worker 1:
>           Functions: 2
>           Options: Inlining true, Optimization true, Expressions true, Deforming true
>           Timing: Generation 0.475 ms, Inlining 78.632 ms, Optimization 5.954 ms, Emission 5.900 ms, Total 90.962 ms
>         Buffers: shared hit=13
>         ->  Partial Aggregate  (cost=15.41..15.42 rows=1 width=8) (actual time=90.608..90.609 rows=1 loops=2)
>               Output: PARTIAL count(*)
>               Buffers: shared hit=13
>               Worker 0: actual time=90.426..90.426 rows=1 loops=1
>                 Buffers: shared hit=7
>               Worker 1: actual time=90.791..90.792 rows=1 loops=1
>                 Buffers: shared hit=6
>               ->  Parallel Seq Scan on pg_catalog.pg_class  (cost=0.00..14.93 rows=193 width=0) (actual
time=0.006..0.048rows=193 loops=2)
 
>                     Output: relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace,
relpages,reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts,
relchecks,relhasoids, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relfo
 
>                     Buffers: shared hit=13
>                     Worker 0: actual time=0.006..0.047 rows=188 loops=1
>                       Buffers: shared hit=7
>                     Worker 1: actual time=0.006..0.048 rows=198 loops=1
>                       Buffers: shared hit=6
> Planning Time: 0.152 ms
> JIT:
>   Functions: 4
>   Options: Inlining true, Optimization true, Expressions true, Deforming true
>   Timing: Generation 0.955 ms, Inlining 157.422 ms, Optimization 11.751 ms, Emission 11.454 ms, Total 181.582 ms
> Execution Time: 184.197 ms

With parallelism on, this is the aggregated cost of doing JIT
compilation. I wonder if, in VERBOSE mode, we should separately display
the cost of JIT for the leader.  Comments?

- Andres

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

25 September 2018, 23:17:23

Hi,

On 2018-09-25 12:50:34 -0700, Andres Freund wrote:
> On 2018-09-25 01:47:49 -0700, Andres Freund wrote:
> > Planning Time: 0.152 ms
> > JIT:
> >   Functions: 4
> >   Options: Inlining true, Optimization true, Expressions true, Deforming true
> >   Timing: Generation 0.955 ms, Inlining 157.422 ms, Optimization 11.751 ms, Emission 11.454 ms, Total 181.582 ms
> > Execution Time: 184.197 ms
> 
> With parallelism on, this is the aggregated cost of doing JIT
> compilation. I wonder if, in VERBOSE mode, we should separately display
> the cost of JIT for the leader.  Comments?

I've pushed the change without that bit - it's just a few additional
lines if we want to change that.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Amit Khandekar

Date:

26 September 2018, 21:15:35

On Tue, 25 Sep 2018 at 14:17, Andres Freund <andres@anarazel.de> wrote:
>
> On 2018-09-18 10:03:02 +0530, Amit Khandekar wrote:
> > Attached v3 patch that does the above change.
>
> Attached is a revised version of that patch.  I've changed quite a few
> things:
> - I've reverted the split of "base" and "provider specific" contexts - I
>   don't think it really buys us anything here.

The idea was to have a single estate field that accumulates all the
JIT counters of leader as well as workers. I see that you want to
delay the merging of workers and backend counters until end of query
execution. More points on this in the bottom section.

>
> - I've reverted the context creation changes - instead of creating a
>   context in the leader just to store instrumentation in the worker,
>   there's now a new EState->es_jit_combined_instr.

The context created in the leader was a light context, involving only
the resource owner stuff, and not the provider initialization.

>
> - That also means worker instrumentation doesn't get folded into the
>   leader's instrumentation.

You mean the worker instrumentation doesn't get folded into the leader
until the query execution end, right ? In the committed code, I see
that now we merge the leader instrumentation into the combined worker
instrumentation in standard_ExecutorEnd().

> This seems good for the future and for
>   extensions - it's not actually "linear" time that's spent doing
>   JIT in workers (& leader), as all of that work happens in
>   parallel. Being able to disentangle that seems important.

Ok. Your point is: we should have the backend and workers info stored
in two separate fields, and combine them only when we need it; so that
we will be in a position to show combined workers-only info separately
in the future. From the code, it looks like the es_jit_combined_instr
stores combined workers info not just from a single Gather node, but
all the Gather nodes in the plan. If we want to have separate workers
info, I am not sure if it makes sense in combining workers from two
separate Gather nodes; because these two sets of workers are
unrelated, aren't they ?

>
> This needs a bit more polish tomorrow, but I'm starting to like where
> this is going.  Comments?

Yeah, I think the plan output looks reasonable compact now. Thanks.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

26 September 2018, 21:28:21

On 2018-09-26 23:45:35 +0530, Amit Khandekar wrote:
> On Tue, 25 Sep 2018 at 14:17, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2018-09-18 10:03:02 +0530, Amit Khandekar wrote:
> > > Attached v3 patch that does the above change.
> >
> > Attached is a revised version of that patch.  I've changed quite a few
> > things:
> > - I've reverted the split of "base" and "provider specific" contexts - I
> >   don't think it really buys us anything here.
> 
> The idea was to have a single estate field that accumulates all the
> JIT counters of leader as well as workers. I see that you want to
> delay the merging of workers and backend counters until end of query
> execution. More points on this in the bottom section.

No, I never want to merge things into the leader's stats.

> > - I've reverted the context creation changes - instead of creating a
> >   context in the leader just to store instrumentation in the worker,
> >   there's now a new EState->es_jit_combined_instr.
> 
> The context created in the leader was a light context, involving only
> the resource owner stuff, and not the provider initialization.

I know, but it added a fair bit of infrastructure and complications just
for that - and I see very very little reason that that'd ever be a
necessary separation.

> >
> > - That also means worker instrumentation doesn't get folded into the
> >   leader's instrumentation.
> 
> You mean the worker instrumentation doesn't get folded into the leader
> until the query execution end, right ? In the committed code, I see
> that now we merge the leader instrumentation into the combined worker
> instrumentation in standard_ExecutorEnd().

No, I mean it *never* gets folded into the leader's
instrumentation. There's a *separate* instrumentation field, where they
do get combined. But that still allows code to print out the leader's
stats alone.


> > This seems good for the future and for
> >   extensions - it's not actually "linear" time that's spent doing
> >   JIT in workers (& leader), as all of that work happens in
> >   parallel. Being able to disentangle that seems important.
> 
> Ok. Your point is: we should have the backend and workers info stored
> in two separate fields, and combine them only when we need it; so that
> we will be in a position to show combined workers-only info separately
> in the future. From the code, it looks like the es_jit_combined_instr
> stores combined workers info not just from a single Gather node, but
> all the Gather nodes in the plan. If we want to have separate workers
> info, I am not sure if it makes sense in combining workers from two
> separate Gather nodes; because these two sets of workers are
> unrelated, aren't they ?

Well, we now have all the individual stats around in an unmodified
manner. We could print both the aggregate and individualized stats.

Greetings,

Andres Freund

Re: Query is over 2x slower with jit=on

From

Lukas Fittl

Date:

01 October 2018, 10:32:18

On Tue, Sep 25, 2018 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote:

I've pushed the change without that bit - it's just a few additional
lines if we want to change that.

It seems that since this commit, JIT statistics are now only being printed for parallel query plans. This is due to ExplainPrintJIT being called before ExecutorEnd, so in a non-parallel query, queryDesc->estate->es_jit_combined_instr will never get set.

Attached a patch that instead introduces a new ExplainPrintJITSummary method that summarizes the statistics before they get printed.

I've also removed an (I believe) unnecessary "if (estate->es_instrument)" check that prevented EXPLAIN without ANALYZE from showing whether JIT would be used or not.

In addition this also updates a missed section in the documentation with the new stats output format.

Best,

Lukas

Lukas Fittl

Attachment

fix-jit-statistics-in-EXPLAIN-for-non-parallel-queries-v1.patch

Re: Query is over 2x slower with jit=on

From

Andres Freund

Date:

03 October 2018, 22:56:46

Hi,

On 2018-10-01 00:32:18 -0700, Lukas Fittl wrote:
> On Tue, Sep 25, 2018 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote:
> >
> > I've pushed the change without that bit - it's just a few additional
> > lines if we want to change that.
> >
> 
> It seems that since this commit, JIT statistics are now only being printed
> for parallel query plans. This is due to ExplainPrintJIT being called
> before ExecutorEnd, so in a non-parallel query,
> queryDesc->estate->es_jit_combined_instr will never get set.

Ugh.


> Attached a patch that instead introduces a new ExplainPrintJITSummary
> method that summarizes the statistics before they get printed.

Yea, I had something like that, and somehow concluded that it wasn't
needed, and moved it to the wrong place :/.


> I've also removed an (I believe) unnecessary "if (estate->es_instrument)"
> check that prevented EXPLAIN without ANALYZE from showing whether JIT would
> be used or not.
> 
> In addition this also updates a missed section in the documentation with
> the new stats output format.

Thanks a lot for the patch!  I've pushed it with some mild changes
(renamed es_jit_combined_stats to worker_stats; changed
ExplainPrintJITSummary to not look at es_jit, but es_jit_flags as
theoretically es_jit might not be allocated; very minor comment
changes).


- Andres