Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Thom Brown
Subject Re: Parallel Seq Scan
Date
Msg-id CAA-aLv7xgHXmcdxt6uHBMTUfJ_rnED8RbNh2JW30BHc=CF1x4Q@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Parallel Seq Scan  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On 11 November 2015 at 17:59, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen
>
> postgres=# set max_parallel_degree to 4;
> SET
> Time: 0.717 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
>
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │                                                          QUERY PLAN
> │
>
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> time=142.541..142.541 rows=1 loops=1)                               │
> │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> time=0.633..130.926 rows=100000 loops=1)                      │
> │         Number of Workers: 2
> │
> │         ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> │               Filter: ((a % 10) = 0)
> │
> │               Rows Removed by Filter: 1526399
> │
> │ Planning time: 0.167 ms
> │
> │ Execution time: 144.519 ms
> │
>
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (8 rows)
>
> Time: 145.374 ms
> postgres=# set max_parallel_degree to 1;
> SET
> Time: 0.706 ms
> postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
>
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │                                                           QUERY PLAN
> │
>
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> time=163.355..163.355 rows=1 loops=1)                              │
> │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> time=0.485..152.827 rows=100000 loops=1)                      │
> │         Number of Workers: 1
> │
> │         ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> │               Filter: ((a % 10) = 0)
> │
> │               Rows Removed by Filter: 1308394
> │
> │ Planning time: 0.129 ms
> │
> │ Execution time: 165.102 ms
> │
>
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (8 rows)
>
> Rows removed by filter: 1308394 X 1526399. Is it expected?

Yeah, I noticed the same thing, but more pronounced:

With set max_parallel_degree = 4:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=49575.51..49575.52 rows=1 width=0) (actual
 
time=744.267..744.267 rows=1 loops=1)  Output: count(*)  Buffers: shared hit=175423  ->  Gather
(cost=1000.00..49544.27rows=12496 width=0) (actual
 
time=0.351..731.662 rows=55151 loops=1)        Output: content        Number of Workers: 4        Buffers: shared
hit=175423       ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
 
rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)              Output: content              Filter:
(((((js.content-> 'tags'::text) -> 0) ->>
 
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))              Rows Removed by Filter: 2085546              Buffers: shared
hit=305123Planningtime: 0.123 msExecution time: 759.313 ms
 
(14 rows)


With set max_parallel_degree = 0:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->0->>'term' like 'design%' or
content->'tags'->0->>'term' like 'web%';
        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=212857.25..212857.26 rows=1 width=0) (actual
 
time=1235.082..1235.082 rows=1 loops=1)  Output: count(*)  Buffers: shared hit=175243  ->  Seq Scan on public.js
(cost=0.00..212826.01rows=12496
 
width=0) (actual time=0.019..1228.515 rows=55151 loops=1)        Output: content        Filter: (((((js.content ->
'tags'::text)-> 0) ->>
 
'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
-> 0) ->> 'term'::text) ~~ 'web%'::text))        Rows Removed by Filter: 1197822        Buffers: shared
hit=175243Planningtime: 0.064 msExecution time: 1235.108 ms
 
(10 rows)

Time: 1235.517 ms


Rows removed: 2085546 vs 1197822
Buffers hit: 305123 vs 175243

Thom

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Parallel Seq Scan
Next
From: Pavel Stehule
Date:
Subject: Re: Parallel Seq Scan