Re: parallel joins, and better parallel explain - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: parallel joins, and better parallel explain
Date
Msg-id CAA4eK1+nObKaKMy+g7iiqZVXp5-=tnZQ5s-sQu3qWoUkjD_SgQ@mail.gmail.com
Whole thread Raw
In response to parallel joins, and better parallel explain  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: parallel joins, and better parallel explain  (Amit Kapila <amit.kapila16@gmail.com>)
Re: parallel joins, and better parallel explain  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Nov 26, 2015 at 8:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Attached find a patch that does (mostly) two things.
>

I have started looking into this and would like to share few findings
with you:

-
+ /*
+ * Primitive parallel cost model.  Assume the leader will do half as much
+ * work as a 
regular worker, because it will also need to read the tuples
+ * returned by the workers when they 
percolate up to the gather ndoe.
+ * This is almost certainly not exactly the right way to model this, 
so
+ * this will probably need to be changed at some point...
+ */
+ if (path->parallel_degree > 
0)
+ path->rows = path->rows / (path->parallel_degree + 0.5);
+
  if (!enable_seqscan)
 
startup_cost += disable_cost;
 
@@ -225,16 +234,6 @@ cost_seqscan(Path *path, PlannerInfo *root,
  cpu_per_tuple 
= cpu_tuple_cost + qpqual_cost.per_tuple;
  run_cost += cpu_per_tuple * baserel->tuples;
 
- /*
-
Primitive parallel cost model.  Assume the leader will do half as much
- * work as a regular worker, because 
it will also need to read the tuples
- * returned by the workers when they percolate up to the gather 
ndoe.
- * This is almost certainly not exactly the right way to model this, so
- * this will probably 
need to be changed at some point...
- */
- if (nworkers > 0)
- run_cost = run_cost / 
(nworkers + 0.5);
-
..

Above and changes in add_path() makes planner not to select parallel path
for seq scan where earlier it was possible. I think you want to change the
costing of parallel plans based on rows selected instead of total_cost,
but there seems to be some problem in the logic (I think gather node is not
taking into account the reduced cost).  Consider below case:

create table tbl_parallel_test(c1 int, c2 char(1000));
insert into tbl_parallel_test values(generate_series(1,1000000),'aaaaa');
Analyze tbl_parallel_test;

set max_parallel_degree=6;
Explain select count(*) from tbl_parallel_test where c1 < 10000;

Without patch, it is able to use parallel plan for above case and
with patch, it is not able to use it.

- There seems to be some inconsistency in Explain's output when
multiple workers are used.

Case -1
Consider the table is populated as mentioned above.
change max_worker_processes=2 in postgresql.conf
set max_parallel_degree=4;

Explain (Analyze,Verbose) select count(*) from tbl_parallel_test where c1 < 10000;
                                                                        QUERY PL
AN
--------------------------------------------------------------------------------
---------------------------------------------------------------------------
 Aggregate  (cost=46227.78..46227.79 rows=1 width=0) (actual time=182583.554..18
2583.555 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual time=167930.03
9..182571.654 rows=9999 loops=1)
         Output: c1, c2
         Number of Workers: 4
         ->  Parallel Seq Scan on public.tbl_parallel_test  (cost=0.00..44245.93
 rows=2129 width=0) (actual time=167904.516..182498.494 rows=3333 loops=3)
               Output: c1, c2
               Filter: (tbl_parallel_test.c1 < 10000)
               Rows Removed by Filter: 330000
               Worker 0: actual time=167890.584..182491.043 rows=4564 loops=1
               Worker 1: actual time=167893.651..182461.904 rows=2740 loops=1
 Planning time: 0.121 ms
 Execution time: 182588.419 ms
(13 rows)


1. Rows removed by Filter should be 990001.
2. Total rows = 9999 at Gather node are right, but it is not obvious how
the rows by each worker and leader leads to that total.

Case-2

change max_worker_processes=8 in postgresql.conf
set max_parallel_degree=4;

postgres=# Explain (Analyze,Verbose) select count(*) from tbl_parallel_test wher
e c1 < 10000;
                                                                      QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------------------
 Aggregate  (cost=46227.78..46227.79 rows=1 width=0) (actual time=39365.233..393
65.234 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual time=47.485..3
9344.574 rows=9999 loops=1)
         Output: c1, c2
         Number of Workers: 4
         ->  Parallel Seq Scan on public.tbl_parallel_test  (cost=0.00..44245.93
 rows=2129 width=0) (actual time=11.910..39262.255 rows=2000 loops=5)
               Output: c1, c2
               Filter: (tbl_parallel_test.c1 < 10000)
               Rows Removed by Filter: 198000
               Worker 0: actual time=5.931..39249.068 rows=3143 loops=1
               Worker 1: actual time=5.778..39254.504 rows=1687 loops=1
               Worker 2: actual time=0.836..39264.683 rows=2170 loops=1
               Worker 3: actual time=1.101..39251.459 rows=1715 loops=1
 Planning time: 0.123 ms
 Execution time: 39383.296 ms
(15 rows)

The problems reported in previous case are visible in this case as
well.  I think both are due to same problem

Case -3
postgres=# Explain (Analyze,Verbose,Buffers) select count(*) from tbl_parallel_t
est where c1 < 10000;
                                                                      QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------------------
 Aggregate  (cost=46227.78..46227.79 rows=1 width=0) (actual time=33607.146..336
07.147 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=548 read=142506
   ->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual time=33.983..3
3592.030 rows=9999 loops=1)
         Output: c1, c2
         Number of Workers: 4
         Buffers: shared hit=548 read=142506
         ->  Parallel Seq Scan on public.tbl_parallel_test  (cost=0.00..44245.93
 rows=2129 width=0) (actual time=13.447..33354.099 rows=2000 loops=5)
               Output: c1, c2
               Filter: (tbl_parallel_test.c1 < 10000)
               Rows Removed by Filter: 198000
               Buffers: shared hit=352 read=142506
               Worker 0: actual time=18.422..33322.132 rows=2170 loops=1
                 Buffers: shared hit=4 read=30765
               Worker 1: actual time=0.803..33283.979 rows=1890 loops=1
                 Buffers: shared hit=1 read=26679
               Worker 2: actual time=0.711..33360.007 rows=1946 loops=1
                 Buffers: shared hit=197 read=30899
               Worker 3: actual time=15.057..33252.605 rows=2145 loops=1
                 Buffers: shared hit=145 read=25433
 Planning time: 0.217 ms
 Execution time: 33612.964 ms
(22 rows)

I am not able to understand how buffer usage add upto what is
shown at Gather node.

-  I think it would be better if we add some explanation to Explain -
Verbose section and an Example on the same page in documentation.
This can help users to understand this feature.


It would be better if we can split this patch into multiple patches like
Explain related changes, Append pushdown related changes, Join
Push down related changes.  You can choose to push the patches as
you prefer, but splitting can certainly help in review/verification of the
code.

With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Regarding recovery configuration
Next
From: Amit Kapila
Date:
Subject: Re: parallel joins, and better parallel explain