Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id CAA4eK1LZodViehcEXbGwA3Lhv20Kq6jrAGo7aHxNK5ZF39w+kw@mail.gmail.com
Whole thread Raw
In response to Re: [DESIGN] ParallelAppend  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Nov 23, 2015 at 10:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 23, 2015 at 7:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Without this patch, that 0.5 (or 50% of leaders effort) is considered for
> > Gather node irrespective of the number of workers or other factors, but
> > I think with Patch that is no longer true and that's what I am worrying
> > about.
>
> Nope, that patch does not change that at all.  We probably should, but
> this patch does not.
>

I have taken some performance data with this patch.


- Select data from inheritance hierarchy with very few tuples.


Create table parent_rel(c1 int, c2 text);
Create table child1_rel () Inherits (parent_rel);
Create table child2_rel () Inherits (parent_rel);

insert into parent_rel values(generate_series(1,15), 'aaaa');
insert into child1_rel values(generate_series(10,20),'aaa');
insert into child2_rel values(generate_series(20,30),'aaa');

Analyze parent_rel;
Analyze child1_rel;
Analyze child2_rel;

set max_parallel_degree=4;
set parallel_setup_cost=0;
set parallel_tuple_cost=0.01;

postgres=# explain select count(*) from parent_rel;
                                      QUERY PLAN

--------------------------------------------------------------------------------
------
 Aggregate  (cost=2.71..2.72 rows=1 width=0)
   ->  Gather  (cost=0.00..2.62 rows=37 width=0)
         Number of Workers: 1
         ->  Append  (cost=0.00..2.25 rows=37 width=0)
               ->  Parallel Seq Scan on parent_rel  (cost=0.00..0.77 rows=15 width=0)
               ->  Parallel Seq Scan on child1_rel  (cost=0.00..0.74 rows=11 width=0)
               ->  Parallel Seq Scan on child2_rel  (cost=0.00..0.74 rows=11 width=0)


I have changed parallel_setup_cost and parallel_tuple_cost, so
it is selecting Gather path even for a small relation.  However,
the same won't be true for non-inheritence relation as if the number
of pages in relation are below than threshold (1000), it won't select
parallel path.  Now here we might want to have similar restriction for
Append Relation as well, that if combining all the child subpaths doesn't
have more than threshold number of pages, then don't try to build the
parallel path.

- Choose the data set that fits in shared_buffers and then run statements
with different selectivity and max_parallel_degree

Test setup
----------------
1. Use,  pgbench -i -s 100 <db_name> to create initial data.
2. Use attached pgbench_partitions.sql to create 10 partitions with equal
data. 
3. Use, parallel_append.sh to execute statements with different Selectivity
and max_parallel_degree (changed parallel_tuple_cost to 0.001)

Selection_criteria – 1% of rows will be selected and used costly function
evaluation for each row



Head



max_parallel_degreeexec_time (ms)workers_used

0762020

2285562

4216203

8216933

16216543

32215793

64214743




Patch



max_parallel_degreeexec_time (ms)workers_used

0770270

2270882

4166484

8137305

16137875

32137945

64138725


So here we can see that with Patch, performance is better, but I
think that is mainly due to number of workers working on a plan.
It is not clear that if we would have allowed more workers to
work at higher max_parallel_degree whether that can give us any
substantial benefit, but anyway I think thats a generic worker allocation
improvement which is not directly related to this patch.  The data
at different selectivities can be found in the attached document,
more or less that shows a similar trend.  Apart from this, I have tried
with data set which doesn't fit shared buffers, but fit in RAM, for that
also it shows similar trend.

Patch looks good, apart from worker allocation stuff, but I think we
can deal with that separately.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: problem with msvc linker - cannot build orafce
Next
From: Jim Nasby
Date:
Subject: Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)