Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Seq Scan
Date	December 6, 2014 06:22:27
Msg-id	CAA4eK1+mn=OB1xpw8st_9vN9jw0UAkWfmMNCmy1THrVxzKFFvg@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Stephen Frost <sfrost@snowman.net>)
Responses	Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On Fri, Dec 5, 2014 at 8:46 PM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Amit,
>
> * Amit Kapila (amit.kapila16@gmail.com) wrote:
> > postgres=# explain select c1 from t1;
> > QUERY PLAN
> > ------------------------------------------------------
> > Seq Scan on t1 (cost=0.00..101.00 rows=100 width=4)
> > (1 row)
> >
> >
> > postgres=# set parallel_seqscan_degree=4;
> > SET
> > postgres=# explain select c1 from t1;
> > QUERY PLAN
> > --------------------------------------------------------------
> > Parallel Seq Scan on t1 (cost=0.00..25.25 rows=100 width=4)
> > Number of Workers: 4
> > Number of Blocks Per Workers: 25
> > (3 rows)
>
> This is all great and interesting, but I feel like folks might be
> waiting to see just what kind of performance results come from this (and
> what kind of hardware is needed to see gains..).

Initially I was thinking that first we should discuss if the design

and idea used in patch is sane, but now as you have asked and

even Robert has asked the same off list to me, I will take the

performance data next week (Another reason why I have not

taken any data is that still the work to push qualification down

to workers is left which I feel is quite important). However I still

think if I get some feedback on some of the basic things like below,

it would be good.

1. As the patch currently stands, it just shares the relevant

data (like relid, target list, block range each worker should

perform on etc.) to the worker and then worker receives that

data and form the planned statement which it will execute and

send the results back to master backend. So the question

here is do you think it is reasonable or should we try to form

the complete plan for each worker and then share the same

and may be other information as well like range table entries

which are required. My personal gut feeling in this matter

is that for long term it might be better to form the complete

plan of each worker in master and share the same, however

I think the current way as done in patch (okay that needs

some improvement) is also not bad and quite easier to implement.

2. Next question related to above is what should be the

output of ExplainPlan, as currently worker is responsible

for forming its own plan, Explain Plan is not able to show

the detailed plan for each worker, is that okay?

3. Some places where optimizations are possible:

- Currently after getting the tuple from heap, it is deformed by

worker and sent via message queue to master backend, master

backend then forms the tuple and send it to upper layer which

before sending it to frontend again deforms it via slot_getallattrs(slot).

- Master backend currently receives the data from multiple workers

serially. We can optimize in a way that it can check other queues,

if there is no data in current queue.

- Master backend is just responsible for coordination among workers

It shares the required information to workers and then fetch the

data processed by each worker, by using some more logic, we might

be able to make master backend also fetch data from heap rather than

doing just co-ordination among workers.

I think in all above places we can do some optimisation, however

we can do that later as well, unless they hit the performance badly for

cases which people care most.

4. Should parallel_seqscan_degree value be dependent on other

backend processes like MaxConnections, max_worker_processes,

autovacuum_max_workers do or should it be independent like

max_wal_senders?

I think it is better to keep it dependent on other backend processes,

however for simplicity, I have kept it similar to max_wal_senders for now.

> There's likely to be
> situations where this change is an improvement while also being cases
> where it makes things worse.

Agreed and I think that will be more clear after doing some

performance tests.

> One really interesting case would be parallel seq scans which are
> executing against foreign tables/FDWs..
>

Sure.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: David Rowley
Date: 06 December 2014, 05:13:25
Subject: Re: Parallel Seq Scan

From: Amit Kapila
Date: 06 December 2014, 06:41:59
Subject: Re: Parallel Seq Scan

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next