Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+TgmoZxTeVEHm6p96YMsZtWr6J9dgGoaC_ZTKnzLLvfBH9QEw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Fri, Jan 9, 2015 at 12:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> The parameters sound reasonable but I'm a bit worried about the way
> you're describing the implementation.  Specifically this comment:
>
> "Cost of starting up parallel workers with default value as 1000.0
> multiplied by number of workers decided for scan."
>
> That appears to imply that we'll decide on the number of workers, figure
> out the cost, and then consider "parallel" as one path and
> "not-parallel" as another.  [...]
> I'd really like to be able to set the 'max parallel' high and then have
> the optimizer figure out how many workers should actually be spawned for
> a given query.

+1.

> Yeah, we also need to consider the i/o side of this, which will
> definitely be tricky.  There are i/o systems out there which are faster
> than a single CPU and ones where a single CPU can manage multiple i/o
> channels.  There are also cases where the i/o system handles sequential
> access nearly as fast as random and cases where sequential is much
> faster than random.  Where we can get an idea of that distinction is
> with seq_page_cost vs. random_page_cost as folks running on SSDs tend to
> lower random_page_cost from the default to indicate that.

On my MacOS X system, I've already seen cases where my parallel_count
module runs incredibly slowly some of the time.  I believe that this
is because having multiple workers reading the relation block-by-block
at the same time causes the OS to fail to realize that it needs to do
aggressive readahead.  I suspect we're going to need to account for
this somehow.

> Yeah, I agree that's more typical.  Robert's point that the master
> backend should participate is interesting but, as I recall, it was based
> on the idea that the master could finish faster than the worker- but if
> that's the case then we've planned it out wrong from the beginning.

So, if the workers have been started but aren't keeping up, the master
should do nothing until they produce tuples rather than participating?That doesn't seem right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan
Next
From: Andreas Karlsson
Date:
Subject: Re: Using 128-bit integers for sum, avg and statistics aggregates