Thread: effect of JIT tuple deform?

effect of JIT tuple deform?

From
Pavel Stehule
Date:
Hi

I try to measure effect of JIT tuple deform and I don't see any possible effect.

Is it this feature active in master branch?

Is possible to see this feature in EXPLAIN ANALYZE?

Regards

Pavel

Re: effect of JIT tuple deform?

From
Pavel Stehule
Date:


2018-06-23 8:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

I try to measure effect of JIT tuple deform and I don't see any possible effect.

Is it this feature active in master branch?

Is possible to see this feature in EXPLAIN ANALYZE?

Unfortunately I got slowdown

0. shared buffers = 1GB
1. create table with 50 int columns
2. insert into this table 4M rows

postgres=# \dt+ wt
                   List of relations
+--------+------+-------+-------+--------+-------------+
| Schema | Name | Type  | Owner |  Size  | Description |
+--------+------+-------+-------+--------+-------------+
| public | wt   | table | pavel | 893 MB |             |
+--------+------+-------+-------+--------+-------------+
(1 row)


default setting

postgres=# explain analyze select sum(c45) from wt;
+------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                QUERY PLAN                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Finalize Aggregate  (cost=136120.69..136120.70 rows=1 width=8) (actual time=879.547..879.547 rows=1 loops=1)                             |
|   ->  Gather  (cost=136120.47..136120.68 rows=2 width=8) (actual time=879.514..879.538 rows=3 loops=1)                                   |
|         Workers Planned: 2                                                                                                               |
|         Workers Launched: 2                                                                                                              |
|         ->  Partial Aggregate  (cost=135120.47..135120.48 rows=1 width=8) (actual time=850.283..850.284 rows=1 loops=3)                  |
|               ->  Parallel Seq Scan on wt  (cost=0.00..130953.77 rows=1666678 width=4) (actual time=0.071..223.338 rows=1333347 loops=3) |
| Planning Time: 0.158 ms                                                                                                                  |
| JIT:                                                                                                                                     |
|   Functions: 6                                                                                                                           |
|   Generation Time: 4.267 ms                                                                                                              |
|   Inlining: false                                                                                                                        |
|   Inlining Time: 0.000 ms                                                                                                                |
|   Optimization: false                                                                                                                    |
|   Optimization Time: 2.472 ms                                                                                                            |
|   Emission Time: 15.929 ms                                                                                                               |
| Execution Time: 899.496 ms                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
(16 rows)

postgres=# set jit_tuple_deforming to off;
SET
postgres=# explain analyze select sum(c45) from wt;
+------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                QUERY PLAN                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Finalize Aggregate  (cost=136120.69..136120.70 rows=1 width=8) (actual time=743.667..743.667 rows=1 loops=1)                             |
|   ->  Gather  (cost=136120.47..136120.68 rows=2 width=8) (actual time=743.654..743.657 rows=3 loops=1)                                   |
|         Workers Planned: 2                                                                                                               |
|         Workers Launched: 2                                                                                                              |
|         ->  Partial Aggregate  (cost=135120.47..135120.48 rows=1 width=8) (actual time=715.532..715.533 rows=1 loops=3)                  |
|               ->  Parallel Seq Scan on wt  (cost=0.00..130953.77 rows=1666678 width=4) (actual time=0.067..216.245 rows=1333347 loops=3) |
| Planning Time: 0.157 ms                                                                                                                  |
| JIT:                                                                                                                                     |
|   Functions: 4                                                                                                                           |
|   Generation Time: 1.989 ms                                                                                                              |
|   Inlining: false                                                                                                                        |
|   Inlining Time: 0.000 ms                                                                                                                |
|   Optimization: false                                                                                                                    |
|   Optimization Time: 0.449 ms                                                                                                            |
|   Emission Time: 7.254 ms                                                                                                                |
| Execution Time: 761.549 ms                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
(16 rows)


When jit_tuple_deforming is enabled, the query is slower about 100ms, looks like performance regression





Regards

Pavel

Re: effect of JIT tuple deform?

From
Dmitry Dolgov
Date:
> On 23 June 2018 at 08:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2018-06-23 8:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>> Hi
>>
>> I try to measure effect of JIT tuple deform and I don't see any possible
>> effect.
>>
>> Is it this feature active in master branch?
>>
>> Is possible to see this feature in EXPLAIN ANALYZE?
>
>
> Unfortunately I got slowdown
>
> 0. shared buffers = 1GB
> 1. create table with 50 int columns
> 2. insert into this table 4M rows

Hi,

Looks like I can reproduce the situation you're talking about (with some minor
differences)

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=127097.71..127097.72 rows=1 width=8)
 (actual time=813.957..813.957 rows=1 loops=1)
   ->  Gather
       (cost=127097.50..127097.71 rows=2 width=8)
       (actual time=813.946..813.950 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=126097.50..126097.51 rows=1 width=8)
             (actual time=802.585..802.585 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..121930.80 rows=1666680 width=4)
                   (actual time=0.040..207.326 rows=1333333 loops=3)
 Planning Time: 0.212 ms
 JIT:
   Functions: 6
   Generation Time: 3.076 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 1.328 ms
   Emission Time: 8.601 ms
 Execution Time: 820.127 ms
(16 rows)

=# set jit_tuple_deforming to off;

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=127097.71..127097.72 rows=1 width=8)
 (actual time=742.578..742.578 rows=1 loops=1)
   ->  Gather
       (cost=127097.50..127097.71 rows=2 width=8)
       (actual time=742.529..742.569 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=126097.50..126097.51 rows=1 width=8)
             (actual time=727.876..727.876 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..121930.80 rows=1666680 width=4)
                   (actual time=0.044..204.097 rows=1333333 loops=3)
 Planning Time: 0.361 ms
 JIT:
   Functions: 4
   Generation Time: 2.840 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 0.751 ms
   Emission Time: 6.436 ms
 Execution Time: 749.827 ms
(16 rows)

But at the same time on the bigger dataset (20M rows instead of 4M) the very
same query gets better:

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=631482.92..631482.93 rows=1 width=8)
 (actual time=2350.288..2350.288 rows=1 loops=1)
   ->  Gather
       (cost=631482.71..631482.92 rows=2 width=8)
       (actual time=2350.276..2350.279 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=630482.71..630482.72 rows=1 width=8)
             (actual time=2328.378..2328.379 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..609649.37 rows=8333337 width=4)
                   (actual time=0.027..1175.960 rows=6666667 loops=3)
 Planning Time: 0.600 ms
 JIT:
   Functions: 6
   Generation Time: 3.661 ms
   Inlining: true
   Inlining Time: 65.373 ms
   Optimization: true
   Optimization Time: 120.885 ms
   Emission Time: 58.836 ms
 Execution Time: 2543.280 ms
(16 rows)

=# set jit_tuple_deforming to off;

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=631482.92..631482.93 rows=1 width=8)
 (actual time=3616.977..3616.977 rows=1 loops=1)
   ->  Gather
       (cost=631482.71..631482.92 rows=2 width=8)
       (actual time=3616.959..3616.971 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=630482.71..630482.72 rows=1 width=8)
             (actual time=3593.220..3593.220 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..609649.37 rows=8333337 width=4)
                   (actual time=0.049..1027.353 rows=6666667 loops=3)
 Planning Time: 0.149 ms
 JIT:
   Functions: 4
   Generation Time: 1.803 ms
   Inlining: true
   Inlining Time: 23.529 ms
   Optimization: true
   Optimization Time: 18.843 ms
   Emission Time: 9.307 ms
 Execution Time: 3625.674 ms
(16 rows)

`perf diff` indeed shows that in the first case (with the 4M rows dataset) the
jitted version has some noticeable delta for one call, and unfortunately so far
I couldn't figure out which one exactly because of JIT (btw, who can explain
how to see a correct full `perf report` in this case? Somehow `perf
inject --jit -o
perf.data.jitted` and jit_profiling_support didn't help).

But since on the bigger dataset I've got expected results, maybe it's just a
sign that JIT kicks in too early in this case and what's necessary is to adjust
jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost?


Re: effect of JIT tuple deform?

From
Andres Freund
Date:
Hi,

On 2018-06-24 22:32:07 +0200, Dmitry Dolgov wrote:
> `perf diff` indeed shows that in the first case (with the 4M rows dataset) the
> jitted version has some noticeable delta for one call, and unfortunately so far
> I couldn't figure out which one exactly because of JIT (btw, who can explain
> how to see a correct full `perf report` in this case? Somehow `perf
> inject --jit -o
> perf.data.jitted` and jit_profiling_support didn't help).

jit_profiling_support currently requires a patch (that I'm about to
merge into their trunk) in LLVM.  I've posted it a couple times to the
list.


> But since on the bigger dataset I've got expected results, maybe it's just a
> sign that JIT kicks in too early in this case and what's necessary is to adjust
> jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost?

Yea, that's very likely the problem.  I wonder if we should multiply the
cost w/ the number of targeted workers to reduce problems with
parallelism...

Greetings,

Andres Freund


Re: effect of JIT tuple deform?

From
Pavel Stehule
Date:


2018-06-24 22:32 GMT+02:00 Dmitry Dolgov <9erthalion6@gmail.com>:
> On 23 June 2018 at 08:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2018-06-23 8:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>> Hi
>>
>> I try to measure effect of JIT tuple deform and I don't see any possible
>> effect.
>>
>> Is it this feature active in master branch?
>>
>> Is possible to see this feature in EXPLAIN ANALYZE?
>
>
> Unfortunately I got slowdown
>
> 0. shared buffers = 1GB
> 1. create table with 50 int columns
> 2. insert into this table 4M rows

Hi,

Looks like I can reproduce the situation you're talking about (with some minor
differences)

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=127097.71..127097.72 rows=1 width=8)
 (actual time=813.957..813.957 rows=1 loops=1)
   ->  Gather
       (cost=127097.50..127097.71 rows=2 width=8)
       (actual time=813.946..813.950 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=126097.50..126097.51 rows=1 width=8)
             (actual time=802.585..802.585 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..121930.80 rows=1666680 width=4)
                   (actual time=0.040..207.326 rows=1333333 loops=3)
 Planning Time: 0.212 ms
 JIT:
   Functions: 6
   Generation Time: 3.076 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 1.328 ms
   Emission Time: 8.601 ms
 Execution Time: 820.127 ms
(16 rows)

=# set jit_tuple_deforming to off;

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=127097.71..127097.72 rows=1 width=8)
 (actual time=742.578..742.578 rows=1 loops=1)
   ->  Gather
       (cost=127097.50..127097.71 rows=2 width=8)
       (actual time=742.529..742.569 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=126097.50..126097.51 rows=1 width=8)
             (actual time=727.876..727.876 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..121930.80 rows=1666680 width=4)
                   (actual time=0.044..204.097 rows=1333333 loops=3)
 Planning Time: 0.361 ms
 JIT:
   Functions: 4
   Generation Time: 2.840 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 0.751 ms
   Emission Time: 6.436 ms
 Execution Time: 749.827 ms
(16 rows)

But at the same time on the bigger dataset (20M rows instead of 4M) the very
same query gets better:

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=631482.92..631482.93 rows=1 width=8)
 (actual time=2350.288..2350.288 rows=1 loops=1)
   ->  Gather
       (cost=631482.71..631482.92 rows=2 width=8)
       (actual time=2350.276..2350.279 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=630482.71..630482.72 rows=1 width=8)
             (actual time=2328.378..2328.379 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..609649.37 rows=8333337 width=4)
                   (actual time=0.027..1175.960 rows=6666667 loops=3)
 Planning Time: 0.600 ms
 JIT:
   Functions: 6
   Generation Time: 3.661 ms
   Inlining: true
   Inlining Time: 65.373 ms
   Optimization: true
   Optimization Time: 120.885 ms
   Emission Time: 58.836 ms
 Execution Time: 2543.280 ms
(16 rows)

=# set jit_tuple_deforming to off;

=# explain analyze select sum(data45) from test_deform;
                                QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate
 (cost=631482.92..631482.93 rows=1 width=8)
 (actual time=3616.977..3616.977 rows=1 loops=1)
   ->  Gather
       (cost=631482.71..631482.92 rows=2 width=8)
       (actual time=3616.959..3616.971 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate
             (cost=630482.71..630482.72 rows=1 width=8)
             (actual time=3593.220..3593.220 rows=1 loops=3)
               ->  Parallel Seq Scan on test_deform
                   (cost=0.00..609649.37 rows=8333337 width=4)
                   (actual time=0.049..1027.353 rows=6666667 loops=3)
 Planning Time: 0.149 ms
 JIT:
   Functions: 4
   Generation Time: 1.803 ms
   Inlining: true
   Inlining Time: 23.529 ms
   Optimization: true
   Optimization Time: 18.843 ms
   Emission Time: 9.307 ms
 Execution Time: 3625.674 ms
(16 rows)

`perf diff` indeed shows that in the first case (with the 4M rows dataset) the
jitted version has some noticeable delta for one call, and unfortunately so far
I couldn't figure out which one exactly because of JIT (btw, who can explain
how to see a correct full `perf report` in this case? Somehow `perf
inject --jit -o
perf.data.jitted` and jit_profiling_support didn't help).

But since on the bigger dataset I've got expected results, maybe it's just a
sign that JIT kicks in too early in this case and what's necessary is to adjust
jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost?

maybe llvm does real compilation too late. It is too strange, because I though about JIT cost like initial (fixed) costs. Now, it looks so this cost is related to row numbers, and then the situation is much more complex.

Regards

Pavel

Re: effect of JIT tuple deform?

From
Andres Freund
Date:
On 2018-06-25 05:32:37 +0200, Pavel Stehule wrote:
> 2018-06-24 22:32 GMT+02:00 Dmitry Dolgov <9erthalion6@gmail.com>:
> 
> > > On 23 June 2018 at 08:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > >
> > >
> > > 2018-06-23 8:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
> > >>
> > >> Hi
> > >>
> > >> I try to measure effect of JIT tuple deform and I don't see any possible
> > >> effect.
> > >>
> > >> Is it this feature active in master branch?
> > >>
> > >> Is possible to see this feature in EXPLAIN ANALYZE?
> > >
> > >
> > > Unfortunately I got slowdown
> > >
> > > 0. shared buffers = 1GB
> > > 1. create table with 50 int columns
> > > 2. insert into this table 4M rows
> >
> > Hi,
> >
> > Looks like I can reproduce the situation you're talking about (with some
> > minor
> > differences)
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=127097.71..127097.72 rows=1 width=8)
> >  (actual time=813.957..813.957 rows=1 loops=1)
> >    ->  Gather
> >        (cost=127097.50..127097.71 rows=2 width=8)
> >        (actual time=813.946..813.950 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=126097.50..126097.51 rows=1 width=8)
> >              (actual time=802.585..802.585 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..121930.80 rows=1666680 width=4)
> >                    (actual time=0.040..207.326 rows=1333333 loops=3)
> >  Planning Time: 0.212 ms
> >  JIT:
> >    Functions: 6
> >    Generation Time: 3.076 ms
> >    Inlining: false
> >    Inlining Time: 0.000 ms
> >    Optimization: false
> >    Optimization Time: 1.328 ms
> >    Emission Time: 8.601 ms
> >  Execution Time: 820.127 ms
> > (16 rows)
> >
> > =# set jit_tuple_deforming to off;
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=127097.71..127097.72 rows=1 width=8)
> >  (actual time=742.578..742.578 rows=1 loops=1)
> >    ->  Gather
> >        (cost=127097.50..127097.71 rows=2 width=8)
> >        (actual time=742.529..742.569 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=126097.50..126097.51 rows=1 width=8)
> >              (actual time=727.876..727.876 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..121930.80 rows=1666680 width=4)
> >                    (actual time=0.044..204.097 rows=1333333 loops=3)
> >  Planning Time: 0.361 ms
> >  JIT:
> >    Functions: 4
> >    Generation Time: 2.840 ms
> >    Inlining: false
> >    Inlining Time: 0.000 ms
> >    Optimization: false
> >    Optimization Time: 0.751 ms
> >    Emission Time: 6.436 ms
> >  Execution Time: 749.827 ms
> > (16 rows)
> >
> > But at the same time on the bigger dataset (20M rows instead of 4M) the
> > very
> > same query gets better:
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=631482.92..631482.93 rows=1 width=8)
> >  (actual time=2350.288..2350.288 rows=1 loops=1)
> >    ->  Gather
> >        (cost=631482.71..631482.92 rows=2 width=8)
> >        (actual time=2350.276..2350.279 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=630482.71..630482.72 rows=1 width=8)
> >              (actual time=2328.378..2328.379 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..609649.37 rows=8333337 width=4)
> >                    (actual time=0.027..1175.960 rows=6666667 loops=3)
> >  Planning Time: 0.600 ms
> >  JIT:
> >    Functions: 6
> >    Generation Time: 3.661 ms
> >    Inlining: true
> >    Inlining Time: 65.373 ms
> >    Optimization: true
> >    Optimization Time: 120.885 ms
> >    Emission Time: 58.836 ms
> >  Execution Time: 2543.280 ms
> > (16 rows)
> >
> > =# set jit_tuple_deforming to off;
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=631482.92..631482.93 rows=1 width=8)
> >  (actual time=3616.977..3616.977 rows=1 loops=1)
> >    ->  Gather
> >        (cost=631482.71..631482.92 rows=2 width=8)
> >        (actual time=3616.959..3616.971 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=630482.71..630482.72 rows=1 width=8)
> >              (actual time=3593.220..3593.220 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..609649.37 rows=8333337 width=4)
> >                    (actual time=0.049..1027.353 rows=6666667 loops=3)
> >  Planning Time: 0.149 ms
> >  JIT:
> >    Functions: 4
> >    Generation Time: 1.803 ms
> >    Inlining: true
> >    Inlining Time: 23.529 ms
> >    Optimization: true
> >    Optimization Time: 18.843 ms
> >    Emission Time: 9.307 ms
> >  Execution Time: 3625.674 ms
> > (16 rows)
> >
> > `perf diff` indeed shows that in the first case (with the 4M rows dataset)
> > the
> > jitted version has some noticeable delta for one call, and unfortunately
> > so far
> > I couldn't figure out which one exactly because of JIT (btw, who can
> > explain
> > how to see a correct full `perf report` in this case? Somehow `perf
> > inject --jit -o
> > perf.data.jitted` and jit_profiling_support didn't help).
> >
> > But since on the bigger dataset I've got expected results, maybe it's just
> > a
> > sign that JIT kicks in too early in this case and what's necessary is to
> > adjust
> > jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost?
> >
> 
> maybe llvm does real compilation too late. It is too strange, because I
> though about JIT cost like initial (fixed) costs. Now, it looks so this
> cost is related to row numbers, and then the situation is much more complex.

I don't understand what you're trying to say here.  Compilation happens
on the first execution of the function. It's a fixed cost. The point
with more rows showing a benefit is that the benefit of JITing is bigger
in total if the function is going to be executed more. And thus at some
point the execution benefit is bigger than the cost of doing JITing.

Greetings,

Andres Freund


Re: effect of JIT tuple deform?

From
Pavel Stehule
Date:


2018-06-25 5:41 GMT+02:00 Andres Freund <andres@anarazel.de>:
On 2018-06-25 05:32:37 +0200, Pavel Stehule wrote:
> 2018-06-24 22:32 GMT+02:00 Dmitry Dolgov <9erthalion6@gmail.com>:
>
> > > On 23 June 2018 at 08:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > >
> > >
> > > 2018-06-23 8:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
> > >>
> > >> Hi
> > >>
> > >> I try to measure effect of JIT tuple deform and I don't see any possible
> > >> effect.
> > >>
> > >> Is it this feature active in master branch?
> > >>
> > >> Is possible to see this feature in EXPLAIN ANALYZE?
> > >
> > >
> > > Unfortunately I got slowdown
> > >
> > > 0. shared buffers = 1GB
> > > 1. create table with 50 int columns
> > > 2. insert into this table 4M rows
> >
> > Hi,
> >
> > Looks like I can reproduce the situation you're talking about (with some
> > minor
> > differences)
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=127097.71..127097.72 rows=1 width=8)
> >  (actual time=813.957..813.957 rows=1 loops=1)
> >    ->  Gather
> >        (cost=127097.50..127097.71 rows=2 width=8)
> >        (actual time=813.946..813.950 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=126097.50..126097.51 rows=1 width=8)
> >              (actual time=802.585..802.585 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..121930.80 rows=1666680 width=4)
> >                    (actual time=0.040..207.326 rows=1333333 loops=3)
> >  Planning Time: 0.212 ms
> >  JIT:
> >    Functions: 6
> >    Generation Time: 3.076 ms
> >    Inlining: false
> >    Inlining Time: 0.000 ms
> >    Optimization: false
> >    Optimization Time: 1.328 ms
> >    Emission Time: 8.601 ms
> >  Execution Time: 820.127 ms
> > (16 rows)
> >
> > =# set jit_tuple_deforming to off;
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=127097.71..127097.72 rows=1 width=8)
> >  (actual time=742.578..742.578 rows=1 loops=1)
> >    ->  Gather
> >        (cost=127097.50..127097.71 rows=2 width=8)
> >        (actual time=742.529..742.569 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=126097.50..126097.51 rows=1 width=8)
> >              (actual time=727.876..727.876 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..121930.80 rows=1666680 width=4)
> >                    (actual time=0.044..204.097 rows=1333333 loops=3)
> >  Planning Time: 0.361 ms
> >  JIT:
> >    Functions: 4
> >    Generation Time: 2.840 ms
> >    Inlining: false
> >    Inlining Time: 0.000 ms
> >    Optimization: false
> >    Optimization Time: 0.751 ms
> >    Emission Time: 6.436 ms
> >  Execution Time: 749.827 ms
> > (16 rows)
> >
> > But at the same time on the bigger dataset (20M rows instead of 4M) the
> > very
> > same query gets better:
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=631482.92..631482.93 rows=1 width=8)
> >  (actual time=2350.288..2350.288 rows=1 loops=1)
> >    ->  Gather
> >        (cost=631482.71..631482.92 rows=2 width=8)
> >        (actual time=2350.276..2350.279 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=630482.71..630482.72 rows=1 width=8)
> >              (actual time=2328.378..2328.379 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..609649.37 rows=8333337 width=4)
> >                    (actual time=0.027..1175.960 rows=6666667 loops=3)
> >  Planning Time: 0.600 ms
> >  JIT:
> >    Functions: 6
> >    Generation Time: 3.661 ms
> >    Inlining: true
> >    Inlining Time: 65.373 ms
> >    Optimization: true
> >    Optimization Time: 120.885 ms
> >    Emission Time: 58.836 ms
> >  Execution Time: 2543.280 ms
> > (16 rows)
> >
> > =# set jit_tuple_deforming to off;
> >
> > =# explain analyze select sum(data45) from test_deform;
> >                                 QUERY PLAN
> > ------------------------------------------------------------
> > -------------------
> >  Finalize Aggregate
> >  (cost=631482.92..631482.93 rows=1 width=8)
> >  (actual time=3616.977..3616.977 rows=1 loops=1)
> >    ->  Gather
> >        (cost=631482.71..631482.92 rows=2 width=8)
> >        (actual time=3616.959..3616.971 rows=3 loops=1)
> >          Workers Planned: 2
> >          Workers Launched: 2
> >          ->  Partial Aggregate
> >              (cost=630482.71..630482.72 rows=1 width=8)
> >              (actual time=3593.220..3593.220 rows=1 loops=3)
> >                ->  Parallel Seq Scan on test_deform
> >                    (cost=0.00..609649.37 rows=8333337 width=4)
> >                    (actual time=0.049..1027.353 rows=6666667 loops=3)
> >  Planning Time: 0.149 ms
> >  JIT:
> >    Functions: 4
> >    Generation Time: 1.803 ms
> >    Inlining: true
> >    Inlining Time: 23.529 ms
> >    Optimization: true
> >    Optimization Time: 18.843 ms
> >    Emission Time: 9.307 ms
> >  Execution Time: 3625.674 ms
> > (16 rows)
> >
> > `perf diff` indeed shows that in the first case (with the 4M rows dataset)
> > the
> > jitted version has some noticeable delta for one call, and unfortunately
> > so far
> > I couldn't figure out which one exactly because of JIT (btw, who can
> > explain
> > how to see a correct full `perf report` in this case? Somehow `perf
> > inject --jit -o
> > perf.data.jitted` and jit_profiling_support didn't help).
> >
> > But since on the bigger dataset I've got expected results, maybe it's just
> > a
> > sign that JIT kicks in too early in this case and what's necessary is to
> > adjust
> > jit_above_cost/jit_optimize_above_cost/jit_inline_above_cost?
> >
>
> maybe llvm does real compilation too late. It is too strange, because I
> though about JIT cost like initial (fixed) costs. Now, it looks so this
> cost is related to row numbers, and then the situation is much more complex.

I don't understand what you're trying to say here.  Compilation happens
on the first execution of the function. It's a fixed cost. The point
with more rows showing a benefit is that the benefit of JITing is bigger
in total if the function is going to be executed more. And thus at some
point the execution benefit is bigger than the cost of doing JITing.

Maybe I am little bit messy because I see slowdown about 60ms, but I don't see this value in JIT statistics

Regards

Pavel



Greetings,

Andres Freund

Re: effect of JIT tuple deform?

From
Pavel Stehule
Date:
Hi

I played with it and maybe I got interesting result.

When I played with really big table, then I got IO waits and effect of jit_tuple_deforming is near to zero (as expected)

When I played with smaller table under RAM, then I can see positive effect of JIT_TD, but only when optimization is active. When optimization was false, then JIT_TD has negative slowdown.

Shared buffers 1GB
table wt .. 100columns, 2M rows, cca 823MB
set max_parallel_workers to 0;

query:

select sum(c99) from wt;

default (jit active) ... 1853ms
set jit_tuple_deforming to off .. 1397ms
set jit off .. 1400ms

== enforced inline ==
jit_tuple_deforming on .. 1610ms
jit_tuple_deforming off .. 1420ms

== enforced optimize ==
jit_tuple_deforming on .. 842ms
jit_tuple_deforming off .. 1420ms


I played with setting and I got the best speed with

jit_inline_above_cost ..some pretty high number
jit_optimize_above_cost 0
jit_tuple_deforming on


So I am able to see effect of jit_tuple_deforming, and very well, but only if optimization is active. When optimization is not active then jit_tuple_deforming does slowdown.

So maybe a usage of jit_tuple_deforming can be conditioned by jit_optimization?

Regards

Pavel





Re: effect of JIT tuple deform?

From
Andres Freund
Date:
On 2018-06-26 21:25:54 +0200, Pavel Stehule wrote:
> Hi
> 
> I played with it and maybe I got interesting result.
> 
> When I played with really big table, then I got IO waits and effect of
> jit_tuple_deforming is near to zero (as expected)
> 
> When I played with smaller table under RAM, then I can see positive effect
> of JIT_TD, but only when optimization is active. When optimization was
> false, then JIT_TD has negative slowdown.
> 
> Shared buffers 1GB
> table wt .. 100columns, 2M rows, cca 823MB
> set max_parallel_workers to 0;
> 
> query:
> 
> select sum(c99) from wt;
> 
> default (jit active) ... 1853ms
> set jit_tuple_deforming to off .. 1397ms
> set jit off .. 1400ms
> 
> == enforced inline ==
> jit_tuple_deforming on .. 1610ms
> jit_tuple_deforming off .. 1420ms
> 
> == enforced optimize ==
> jit_tuple_deforming on .. 842ms
> jit_tuple_deforming off .. 1420ms
> 
> 
> I played with setting and I got the best speed with
> 
> jit_inline_above_cost ..some pretty high number
> jit_optimize_above_cost 0
> jit_tuple_deforming on
> 
> 
> So I am able to see effect of jit_tuple_deforming, and very well, but only
> if optimization is active. When optimization is not active then
> jit_tuple_deforming does slowdown.
> 
> So maybe a usage of jit_tuple_deforming can be conditioned by
> jit_optimization?

No, it's definitely useful outside of that. I don't think we should
optimize for one individual query.

Greetings,

Andres Freund


Re: effect of JIT tuple deform?

From
Tomas Vondra
Date:
On 06/26/2018 09:25 PM, Pavel Stehule wrote:
> Hi
> 
> ...
> 
> So I am able to see effect of jit_tuple_deforming, and very well, but 
> only if optimization is active. When optimization is not active then 
> jit_tuple_deforming does slowdown.
> 
> So maybe a usage of jit_tuple_deforming can be conditioned by 
> jit_optimization?
> 

Can you share the test case and some detail about the hardware and 
PostgreSQL configuration?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: effect of JIT tuple deform?

From
Pavel Stehule
Date:


2018-06-27 17:19 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 06/26/2018 09:25 PM, Pavel Stehule wrote:
Hi

...

So I am able to see effect of jit_tuple_deforming, and very well, but only if optimization is active. When optimization is not active then jit_tuple_deforming does slowdown.

So maybe a usage of jit_tuple_deforming can be conditioned by jit_optimization?


Can you share the test case and some detail about the hardware and PostgreSQL configuration?

I did very simple test


0.

master branch without asserts, shared buffer to 1GB
tested on Lenovo T520 8GB RAM 8CPU, i7
Fedora 28, gcc  CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"  --with-llvm

1.

select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') || ')' from generate_series(1,100) g(i) \gexec


2.

begin;
select 'insert into wt values(' || (select string_agg((random()*10000)::int::text,',') from generate_series(1,j - j + 100) g(i)) || ')' from generate_series(1,1000000) gg(j) \gexec
insert into wt select * from wt;
commit;

3.

set max_paralel_workers to 0; -- the effect of JIT will be more visible

analyze wt;
\timing

select sum(c99) from wt;

I tested some combination of:

jit: off on
jit_inline_above_cost: 0, 10000000000000
jit_optimize_above_cost: 0, 10000000000000
jit_tuple_deforming: on, off


My primitive tests shows nice possitive effect of jit_tuple_deforming if jit optimization is active. When jit optimization is not active, then jit_tuple_deforming did slowdown in my test.

So there is range of costs between 100000 and 500000 where jit_tuple_deforming didn't work well (without optimization)

I am limmited by small memory of my notebook - when I created table larger than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was eliminated.

Regards

Pavel




regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: effect of JIT tuple deform?

From
Pierre Ducroquet
Date:
On Wednesday, June 27, 2018 5:38:31 PM CEST Pavel Stehule wrote:
> 2018-06-27 17:19 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
> > On 06/26/2018 09:25 PM, Pavel Stehule wrote:
> >> Hi
> >> 
> >> ...
> >> 
> >> So I am able to see effect of jit_tuple_deforming, and very well, but
> >> only if optimization is active. When optimization is not active then
> >> jit_tuple_deforming does slowdown.
> >> 
> >> So maybe a usage of jit_tuple_deforming can be conditioned by
> >> jit_optimization?
> > 
> > Can you share the test case and some detail about the hardware and
> > PostgreSQL configuration?
> 
> I did very simple test
> 
> 
> 0.
> 
> master branch without asserts, shared buffer to 1GB
> tested on Lenovo T520 8GB RAM 8CPU, i7
> Fedora 28, gcc  CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"  --with-llvm
> 
> 1.
> 
> select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') ||
> ')' from generate_series(1,100) g(i) \gexec
> 
> 
> 2.
> 
> begin;
> select 'insert into wt values(' || (select
> string_agg((random()*10000)::int::text,',') from generate_series(1,j - j +
> 100) g(i)) || ')' from generate_series(1,1000000) gg(j) \gexec
> insert into wt select * from wt;
> commit;
> 
> 3.
> 
> set max_paralel_workers to 0; -- the effect of JIT will be more visible
> 
> analyze wt;
> \timing
> 
> select sum(c99) from wt;
> 
> I tested some combination of:
> 
> jit: off on
> jit_inline_above_cost: 0, 10000000000000
> jit_optimize_above_cost: 0, 10000000000000
> jit_tuple_deforming: on, off
> 
> 
> My primitive tests shows nice possitive effect of jit_tuple_deforming if
> jit optimization is active. When jit optimization is not active, then
> jit_tuple_deforming did slowdown in my test.
> 
> So there is range of costs between 100000 and 500000 where
> jit_tuple_deforming didn't work well (without optimization)
> 
> I am limmited by small memory of my notebook - when I created table larger
> than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was
> eliminated.
> 
> Regards
> 
> Pavel

Hi

I have studied this case a bit, and I think too that there is something wrong 
here.
Right now, jit_optimize is a -O3. It's really expensive, and triggering it 
here is not the right solution. In the attached patch, I force a -O1 for tuple 
deforming. With your test case, on my computer, the results are :
- no jit : 850ms
- jit with tuple deforming without optimizations : 1650 ms (1.5ms spent 
optimizing)
- jit without tuple deforming : 820ms (0.2ms)
- jit with tuple deforming with optimization (-O3) : 770ms (105ms)
- jit with tuple deforming with patch (-O1) : 725ms (54ms)

I will look at the generated code for tuple deforming, but I think we should 
pre-optimize the LLVM bytecode if we do not want to involve the LLVM 
optimization passes.

Regards

 Pierre

Attachment