Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [DESIGN] ParallelAppend
Date	November 25, 2015 09:35:46
Msg-id	CAA4eK1LZodViehcEXbGwA3Lhv20Kq6jrAGo7aHxNK5ZF39w+kw@mail.gmail.com Whole thread Raw
In response to	Re: [DESIGN] ParallelAppend (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Nov 23, 2015 at 10:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 23, 2015 at 7:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Without this patch, that 0.5 (or 50% of leaders effort) is considered for
> > Gather node irrespective of the number of workers or other factors, but
> > I think with Patch that is no longer true and that's what I am worrying
> > about.
>
> Nope, that patch does not change that at all. We probably should, but
> this patch does not.
>

I have taken some performance data with this patch.

- Select data from inheritance hierarchy with very few tuples.

Create table parent_rel(c1 int, c2 text);

Create table child1_rel () Inherits (parent_rel);

Create table child2_rel () Inherits (parent_rel);

insert into parent_rel values(generate_series(1,15), 'aaaa');

insert into child1_rel values(generate_series(10,20),'aaa');

insert into child2_rel values(generate_series(20,30),'aaa');

Analyze parent_rel;

Analyze child1_rel;

Analyze child2_rel;

set max_parallel_degree=4;

set parallel_setup_cost=0;

set parallel_tuple_cost=0.01;

postgres=# explain select count(*) from parent_rel;

QUERY PLAN

--------------------------------------------------------------------------------

------

Aggregate (cost=2.71..2.72 rows=1 width=0)

-> Gather (cost=0.00..2.62 rows=37 width=0)

Number of Workers: 1

-> Append (cost=0.00..2.25 rows=37 width=0)

-> Parallel Seq Scan on parent_rel (cost=0.00..0.77 rows=15 width=0)

-> Parallel Seq Scan on child1_rel (cost=0.00..0.74 rows=11 width=0)

-> Parallel Seq Scan on child2_rel (cost=0.00..0.74 rows=11 width=0)

I have changed parallel_setup_cost and parallel_tuple_cost, so

it is selecting Gather path even for a small relation. However,

the same won't be true for non-inheritence relation as if the number

of pages in relation are below than threshold (1000), it won't select

parallel path. Now here we might want to have similar restriction for

Append Relation as well, that if combining all the child subpaths doesn't

have more than threshold number of pages, then don't try to build the

parallel path.

- Choose the data set that fits in shared_buffers and then run statements

with different selectivity and max_parallel_degree

Test setup

----------------

1. Use, pgbench -i -s 100 <db_name> to create initial data.

2. Use attached pgbench_partitions.sql to create 10 partitions with equal

data.

3. Use, parallel_append.sh to execute statements with different Selectivity

and max_parallel_degree (changed parallel_tuple_cost to 0.001)

Selection_criteria – 1% of rows will be selected and used costly function

evaluation for each row


Head
	max_parallel_degree	exec_time (ms)	workers_used
	0	76202	0
	2	28556	2
	4	21620	3
	8	21693	3
	16	21654	3
	32	21579	3
	64	21474	3

Patch
	max_parallel_degree	exec_time (ms)	workers_used
	0	77027	0
	2	27088	2
	4	16648	4
	8	13730	5
	16	13787	5
	32	13794	5
	64	13872	5

So here we can see that with Patch, performance is better, but I

think that is mainly due to number of workers working on a plan.

It is not clear that if we would have allowed more workers to

work at higher max_parallel_degree whether that can give us any

substantial benefit, but anyway I think thats a generic worker allocation

improvement which is not directly related to this patch. The data

at different selectivities can be found in the attached document,

more or less that shows a similar trend. Apart from this, I have tried

with data set which doesn't fit shared buffers, but fit in RAM, for that

also it shows similar trend.

Patch looks good, apart from worker allocation stuff, but I think we

can deal with that separately.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Craig Ringer
Date: 25 November 2015, 09:00:01
Subject: Re: problem with msvc linker - cannot build orafce

From: Jim Nasby
Date: 25 November 2015, 09:56:15
Subject: Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Attachment

Previous

Next