Home > mailing lists

Re: parallel joins, and better parallel explain - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: parallel joins, and better parallel explain
Date	December 1, 2015 15:21:25
Msg-id	CAA4eK1+nObKaKMy+g7iiqZVXp5-=tnZQ5s-sQu3qWoUkjD_SgQ@mail.gmail.com Whole thread Raw
In response to	parallel joins, and better parallel explain (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: parallel joins, and better parallel explain (Amit Kapila <amit.kapila16@gmail.com>) Re: parallel joins, and better parallel explain (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 26, 2015 at 8:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Attached find a patch that does (mostly) two things.
>

I have started looking into this and would like to share few findings

with you:

+ /*

+ * Primitive parallel cost model. Assume the leader will do half as much

+ * work as a

regular worker, because it will also need to read the tuples

+ * returned by the workers when they

percolate up to the gather ndoe.

+ * This is almost certainly not exactly the right way to model this,

+ * this will probably need to be changed at some point...

+ */

+ if (path->parallel_degree >

+ path->rows = path->rows / (path->parallel_degree + 0.5);

if (!enable_seqscan)

startup_cost += disable_cost;

@@ -225,16 +234,6 @@ cost_seqscan(Path *path, PlannerInfo *root,

cpu_per_tuple

= cpu_tuple_cost + qpqual_cost.per_tuple;

run_cost += cpu_per_tuple * baserel->tuples;

- /*

- *

Primitive parallel cost model. Assume the leader will do half as much

- * work as a regular worker, because

it will also need to read the tuples

- * returned by the workers when they percolate up to the gather

ndoe.

- * This is almost certainly not exactly the right way to model this, so

- * this will probably

need to be changed at some point...

- */

- if (nworkers > 0)

- run_cost = run_cost /

(nworkers + 0.5);

Above and changes in add_path() makes planner not to select parallel path

for seq scan where earlier it was possible. I think you want to change the

costing of parallel plans based on rows selected instead of total_cost,

but there seems to be some problem in the logic (I think gather node is not

taking into account the reduced cost). Consider below case:

create table tbl_parallel_test(c1 int, c2 char(1000));

insert into tbl_parallel_test values(generate_series(1,1000000),'aaaaa');

Analyze tbl_parallel_test;

set max_parallel_degree=6;

Explain select count(*) from tbl_parallel_test where c1 < 10000;

Without patch, it is able to use parallel plan for above case and

with patch, it is not able to use it.

- There seems to be some inconsistency in Explain's output when

multiple workers are used.

Case -1

Consider the table is populated as mentioned above.

change max_worker_processes=2 in postgresql.conf

set max_parallel_degree=4;

Explain (Analyze,Verbose) select count(*) from tbl_parallel_test where c1 < 10000;

QUERY PL

--------------------------------------------------------------------------------

---------------------------------------------------------------------------

Aggregate (cost=46227.78..46227.79 rows=1 width=0) (actual time=182583.554..18

2583.555 rows=1 loops=1)

Output: count(*)

-> Gather (cost=1000.00..46203.83 rows=9579 width=0) (actual time=167930.03

9..182571.654 rows=9999 loops=1)

Output: c1, c2

Number of Workers: 4

-> Parallel Seq Scan on public.tbl_parallel_test (cost=0.00..44245.93

rows=2129 width=0) (actual time=167904.516..182498.494 rows=3333 loops=3)

Output: c1, c2

Filter: (tbl_parallel_test.c1 < 10000)

Rows Removed by Filter: 330000

Worker 0: actual time=167890.584..182491.043 rows=4564 loops=1

Worker 1: actual time=167893.651..182461.904 rows=2740 loops=1

Planning time: 0.121 ms

Execution time: 182588.419 ms

(13 rows)

1. Rows removed by Filter should be 990001.

2. Total rows = 9999 at Gather node are right, but it is not obvious how

the rows by each worker and leader leads to that total.

Case-2

change max_worker_processes=8 in postgresql.conf

set max_parallel_degree=4;

postgres=# Explain (Analyze,Verbose) select count(*) from tbl_parallel_test wher

e c1 < 10000;

QUERY PLAN

--------------------------------------------------------------------------------

----------------------------------------------------------------------

Aggregate (cost=46227.78..46227.79 rows=1 width=0) (actual time=39365.233..393

65.234 rows=1 loops=1)

Output: count(*)

-> Gather (cost=1000.00..46203.83 rows=9579 width=0) (actual time=47.485..3

9344.574 rows=9999 loops=1)

Output: c1, c2

Number of Workers: 4

-> Parallel Seq Scan on public.tbl_parallel_test (cost=0.00..44245.93

rows=2129 width=0) (actual time=11.910..39262.255 rows=2000 loops=5)

Output: c1, c2

Filter: (tbl_parallel_test.c1 < 10000)

Rows Removed by Filter: 198000

Worker 0: actual time=5.931..39249.068 rows=3143 loops=1

Worker 1: actual time=5.778..39254.504 rows=1687 loops=1

Worker 2: actual time=0.836..39264.683 rows=2170 loops=1

Worker 3: actual time=1.101..39251.459 rows=1715 loops=1

Planning time: 0.123 ms

Execution time: 39383.296 ms

(15 rows)

The problems reported in previous case are visible in this case as

well. I think both are due to same problem

Case -3

postgres=# Explain (Analyze,Verbose,Buffers) select count(*) from tbl_parallel_t

est where c1 < 10000;

QUERY PLAN

--------------------------------------------------------------------------------

----------------------------------------------------------------------

Aggregate (cost=46227.78..46227.79 rows=1 width=0) (actual time=33607.146..336

07.147 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=548 read=142506

-> Gather (cost=1000.00..46203.83 rows=9579 width=0) (actual time=33.983..3

3592.030 rows=9999 loops=1)

Output: c1, c2

Number of Workers: 4

Buffers: shared hit=548 read=142506

-> Parallel Seq Scan on public.tbl_parallel_test (cost=0.00..44245.93

rows=2129 width=0) (actual time=13.447..33354.099 rows=2000 loops=5)

Output: c1, c2

Filter: (tbl_parallel_test.c1 < 10000)

Rows Removed by Filter: 198000

Buffers: shared hit=352 read=142506

Worker 0: actual time=18.422..33322.132 rows=2170 loops=1

Buffers: shared hit=4 read=30765

Worker 1: actual time=0.803..33283.979 rows=1890 loops=1

Buffers: shared hit=1 read=26679

Worker 2: actual time=0.711..33360.007 rows=1946 loops=1

Buffers: shared hit=197 read=30899

Worker 3: actual time=15.057..33252.605 rows=2145 loops=1

Buffers: shared hit=145 read=25433

Planning time: 0.217 ms

Execution time: 33612.964 ms

(22 rows)

I am not able to understand how buffer usage add upto what is

shown at Gather node.

- I think it would be better if we add some explanation to Explain -

Verbose section and an Example on the same page in documentation.

This can help users to understand this feature.

It would be better if we can split this patch into multiple patches like

Explain related changes, Append pushdown related changes, Join

Push down related changes. You can choose to push the patches as

you prefer, but splitting can certainly help in review/verification of the

code.

With Regards,
Amit Kapila.

EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michael Paquier
Date: 01 December 2015, 14:59:01
Subject: Re: Regarding recovery configuration

From: Amit Kapila
Date: 01 December 2015, 15:28:31
Subject: Re: parallel joins, and better parallel explain

Re: parallel joins, and better parallel explain - Mailing list pgsql-hackers

Previous

Next