Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Thom Brown
Subject Re: Parallel Seq Scan
Date
Msg-id CAA-aLv4Ffpg1WQTQmkkTPo-crOV8HZpFGQeuFd-viKbw0zeFBA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Thom Brown <thom@linux.com>)
List pgsql-hackers
On 11 November 2015 at 19:26, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed.  That didn't take long.
>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
>         ps_instrument = &instrumentation->ps_instrument[i];
>         SpinLockAcquire(&ps_instrument->mutex);
>         InstrAggNode(&ps_instrument->instr, planstate->instrument);
>         SpinLockRelease(&ps_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
>         /* No need to acquire the spinlock here; workers have exited already. */
>         ps_instrument = &instrumentation->ps_instrument[i];
>         InstrAggNode(planstate->instrument, &ps_instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()?  If that makes the results
> consistent, this is a race.  If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.

Hmm.. I made the change, but clearly it's not sleeping properly with
my change (I'm expecting a total runtime in excess of 1 second):

max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=49578.18..49578.19 rows=1 width=0) (actual
 
time=797.518..797.518 rows=1 loops=1)  Output: count(*)  Buffers: shared hit=174883 read=540  ->  Gather
(cost=1000.00..49546.93rows=12500 width=0) (actual
 
time=0.245..784.959 rows=55151 loops=1)        Output: content        Number of Workers: 4        Buffers: shared
hit=174883read=540        ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
 
rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)              Output: content              Filter:
(((((js.content-> 'tags'::text) -> 0) ->>
 
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))              Rows Removed by Filter: 2051330              Buffers: shared
hit=299224read=907Planning time: 0.086 msExecution time: 803.026 ms
 


max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=212867.43..212867.44 rows=1 width=0) (actual
 
time=1278.717..1278.717 rows=1 loops=1)  Output: count(*)  Buffers: shared hit=174671 read=572  ->  Seq Scan on
public.js (cost=0.00..212836.18 rows=12500
 
width=0) (actual time=0.018..1272.030 rows=55151 loops=1)        Output: content        Filter: (((((js.content ->
'tags'::text)-> 0) ->>
 
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))        Rows Removed by Filter: 1197822        Buffers: shared hit=174671
read=572Planningtime: 0.064 msExecution time: 1278.741 ms
 
(10 rows)

Time: 1279.145 ms


I did, however, notice that repeated runs of the query with
max_parallel_degree = 4 yields different counts of rows removed by
filter:

Run 1: 2051330
Run 2: 2081252
Run 3: 2065112
Run 4: 2022045
Run 5: 2025384
Run 6: 2059360
Run 7: 2079620
Run 8: 2058541

-- 
Thom



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Parallel Seq Scan
Next
From: Robert Haas
Date:
Subject: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)