Re: Parallel Seq Scan - Mailing list pgsql-hackers
| From | Thom Brown |
|---|---|
| Subject | Re: Parallel Seq Scan |
| Date | |
| Msg-id | CAA-aLv4Ffpg1WQTQmkkTPo-crOV8HZpFGQeuFd-viKbw0zeFBA@mail.gmail.com Whole thread Raw |
| In response to | Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>) |
| Responses |
Re: Parallel Seq Scan
|
| List | pgsql-hackers |
On 11 November 2015 at 19:26, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I have a first query
>>
>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>> differen
>
> Hmm, I see I was right about people finding more bugs once this was
> committed. That didn't take long.
>
> There's supposed to be code to handle this - see the
> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
> evidently a few bricks shy of a load.
> ExecParallelReportInstrumentation is supposed to transfer the counts
> from each worker to the DSM:
>
> ps_instrument = &instrumentation->ps_instrument[i];
> SpinLockAcquire(&ps_instrument->mutex);
> InstrAggNode(&ps_instrument->instr, planstate->instrument);
> SpinLockRelease(&ps_instrument->mutex);
>
> And ExecParallelRetrieveInstrumentation is supposed to slurp those
> counts back into the leader's PlanState objects:
>
> /* No need to acquire the spinlock here; workers have exited already. */
> ps_instrument = &instrumentation->ps_instrument[i];
> InstrAggNode(planstate->instrument, &ps_instrument->instr);
>
> This might be a race condition, or it might be just wrong logic.
> Could you test what happens if you insert something like a 1-second
> sleep in ExecParallelFinish just after the call to
> WaitForParallelWorkersToFinish()? If that makes the results
> consistent, this is a race. If it doesn't, something else is wrong:
> then it would be useful to know whether the workers are actually
> calling ExecParallelReportInstrumentation, and whether the leader is
> actually calling ExecParallelRetrieveInstrumentation, and if so
> whether they are doing it for the correct set of nodes.
Hmm.. I made the change, but clearly it's not sleeping properly with
my change (I'm expecting a total runtime in excess of 1 second):
max_parallel_degree = 4:
# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=49578.18..49578.19 rows=1 width=0) (actual
time=797.518..797.518 rows=1 loops=1) Output: count(*) Buffers: shared hit=174883 read=540 -> Gather
(cost=1000.00..49546.93rows=12500 width=0) (actual
time=0.245..784.959 rows=55151 loops=1) Output: content Number of Workers: 4 Buffers: shared
hit=174883read=540 -> Parallel Seq Scan on public.js (cost=0.00..47296.93
rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1) Output: content Filter:
(((((js.content-> 'tags'::text) -> 0) ->>
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text)) Rows Removed by Filter: 2051330 Buffers: shared
hit=299224read=907Planning time: 0.086 msExecution time: 803.026 ms
max_parallel_degree = 0:
# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=212867.43..212867.44 rows=1 width=0) (actual
time=1278.717..1278.717 rows=1 loops=1) Output: count(*) Buffers: shared hit=174671 read=572 -> Seq Scan on
public.js (cost=0.00..212836.18 rows=12500
width=0) (actual time=0.018..1272.030 rows=55151 loops=1) Output: content Filter: (((((js.content ->
'tags'::text)-> 0) ->>
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text)) Rows Removed by Filter: 1197822 Buffers: shared hit=174671
read=572Planningtime: 0.064 msExecution time: 1278.741 ms
(10 rows)
Time: 1279.145 ms
I did, however, notice that repeated runs of the query with
max_parallel_degree = 4 yields different counts of rows removed by
filter:
Run 1: 2051330
Run 2: 2081252
Run 3: 2065112
Run 4: 2022045
Run 5: 2025384
Run 6: 2059360
Run 7: 2079620
Run 8: 2058541
--
Thom
pgsql-hackers by date: