Re: Query is over 2x slower with jit=on - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Query is over 2x slower with jit=on
Date
Msg-id CAJ3gD9cWBKnzXkhu30njaY1au3NO0XD=r4N3GcC_nwFbw5GYYw@mail.gmail.com
Whole thread Raw
In response to Re: Query is over 2x slower with jit=on  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: Query is over 2x slower with jit=on
Re: Query is over 2x slower with jit=on
List pgsql-hackers
On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 10 September 2018 at 21:39, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>>> Attached is a patch that accumulates the per-worker jit counters into
>>> the leader process.
>>
>> Thanks!
>>
>>
>>> I think we better show per-worker jit info also. The current patch
>>> does not show that. I think it would be easy to continue on the patch
>>> to show per-worker info also. Under the Gather node, we can show
>>> per-worker jit counters. I think this would be useful too, besides the
>>> cumulative figures in the leader process. Comments ?
>>
>> Yes, I think that'd be good.
> Ok. Will continue on the patch.
>
>> I think we either should print the stats at
>> the top level as $leader_value, $combined_worker_value, $total_value or
>> just have the $combined_worker_value at the level where we print other
>> stats from the worker, too.
>
> Yes, I think we can follow and be consistent with whatever way in
> which the other worker stats are printed. Will check.
>
> Note: Since there can be a multiple separate Gather plans under a plan
> tree, I think we can show this info for each Gather plan.

The attached patch shows per-worker information, besides the
cumulative figure in the end of plan. Attached is the complete output
of an aggregate parallel query on tenk1 table (created using the
regression tests). You can see that the per-worker figures are under
each of the Gather plans.


We can show combined values of all the workers under a Gather plan as
one single value, but I guess if we just show per-worker values, we
don't have to additionally show combined value. Comments ?
Also, I have kept the the per-worker info only for verbose=true.

The Gather plan's leader value is not shown. We also don't show
leader-specific values when we show the usual per-worker
instrumentation values. So I think we are consistent. Actually it is a
bit tedious to collect only leader-specific values. And I don't think
it is worth trying for it. For worker, I am not collecting
per-plan-node info, because we want to show total worker figures, and
not per-plan-node figures. For the normal instrumentations, we want to
show per-node info. For leader-specific instrumentation, we would need
to store per-(Gather-)plan info.

Need to finalize the indentation and the other explain formats. One
option is to keep the per-worker JIT info in a single line, like how
we print the normal per-worker instrumentation :
Worker 1: actual time=20.971..35.975 rows=39991 loops=1


Below is a snippet :

 Aggregate  (cost=18656.00..18656.01 rows=1 width=8) (actual
time=593.917..593.917 rows=1 loops=1)
   Output: count(*)
   ->  Hash Join  (cost=10718.00..18456.00 rows=80000 width=0) (actual
time=496.650..593.883 rows=16 loops=1)
         Inner Unique: true
         Hash Cond: ((a.unique1 = b.unique1) AND (a.two =
(row_number() OVER (?))))
         ->  Gather  (cost=0.00..5918.00 rows=160000 width=8) (actual
time=192.319..223.384 rows=160000 loops=1)
               Output: a.unique1, a.two
               Workers Planned: 4
               Workers Launched: 4
               Jit for Worker : 0
                 Functions: 2
                 Generation Time: 0.273 ms
                 Inlining: true
                 Inlining Time: 43.686 ms
                 Optimization: true
                 Optimization Time: 10.788 ms
                 Emission Time: 8.438 ms
               Jit for Worker : 1
                 Functions: 2
                 Generation Time: 0.293 ms
                 Inlining: true
                 Inlining Time: 72.587 ms
                 Optimization: true
                 Optimization Time: 10.386 ms
                 Emission Time: 8.115 ms
..........
..........
..........
 Planning Time: 0.548 ms
 Jit:
   Functions: 40
   Generation Time: 3.892 ms
   Inlining: true
   Inlining Time: 409.397 ms
   Optimization: true
   Optimization Time: 174.708 ms
   Emission Time: 91.785 ms
 Execution Time: 610.262 ms
(98 rows)


>
>>
>>
>>>  /*
>>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>>> + */
>>> +static void
>>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>>> +                                                                        SharedJitInstrumentation *shared_jit)
>>> +{
>>> +     int                     n;
>>> +     JitContext *jit = planstate->state->es_jit;
>>> +
>>> +     /* If the leader hasn't yet created a jit context, allocate one now. */
>>> +     if (!jit)
>>> +     {
>>> +             planstate->state->es_jit = jit =
>>> +                     jit_create_context(planstate->state->es_jit_flags);
>>> +     }
>>
>> Wouldn't it be better to move the jit instrumentation to outside of the
>> context, to avoid having to do this?  Or just cope with not having
>> instrumentation for the leader in this case?  We'd kinda need to deal
>> with failure to create one anyway?
>
> Yeah, I think taking out the instrumentation out of the context looks
> better. Will work on that.

Not yet done this. Will do now.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [PATCH] Fix for infinite signal loop in parallel scan
Next
From: Brad DeJong
Date:
Subject: Re: Re[3]: doc - improve description of default privileges