Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Stephen Frost
Subject	Re: Parallel Seq Scan
Date	January 11, 2015 11:02:05
Msg-id	20150111110158.GS3062@tamriel.snowman.net Whole thread Raw
In response to	Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 9, 2015 at 12:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Yeah, we also need to consider the i/o side of this, which will
> > definitely be tricky.  There are i/o systems out there which are faster
> > than a single CPU and ones where a single CPU can manage multiple i/o
> > channels.  There are also cases where the i/o system handles sequential
> > access nearly as fast as random and cases where sequential is much
> > faster than random.  Where we can get an idea of that distinction is
> > with seq_page_cost vs. random_page_cost as folks running on SSDs tend to
> > lower random_page_cost from the default to indicate that.
>
> On my MacOS X system, I've already seen cases where my parallel_count
> module runs incredibly slowly some of the time.  I believe that this
> is because having multiple workers reading the relation block-by-block
> at the same time causes the OS to fail to realize that it needs to do
> aggressive readahead.  I suspect we're going to need to account for
> this somehow.

So, for my 2c, I've long expected us to parallelize at the relation-file
level for these kinds of operations.  This goes back to my other
thoughts on how we should be thinking about parallelizing inbound data
for bulk data loads but it seems appropriate to consider it here also.
One of the issues there is that 1G still feels like an awful lot for a
minimum work size for each worker and it would mean we don't parallelize
for relations less than that size.

On a random VM on my personal server, an uncached 1G read takes over
10s.  Cached it's less than half that, of course.  This is all spinning
rust (and only 7200 RPM at that) and there's a lot of other stuff going
on but that still seems like too much of a chunk to give to one worker
unless the overall data set to go through is really large.

There's other issues in there too, of course, if we're dumping data in
like this then we have to either deal with jagged relation files somehow
or pad the file out to 1G, and that doesn't even get into the issues
around how we'd have to redesign the interfaces for relation access and
how this thinking is an utter violation of the modularity we currently
have there.

> > Yeah, I agree that's more typical.  Robert's point that the master
> > backend should participate is interesting but, as I recall, it was based
> > on the idea that the master could finish faster than the worker- but if
> > that's the case then we've planned it out wrong from the beginning.
>
> So, if the workers have been started but aren't keeping up, the master
> should do nothing until they produce tuples rather than participating?
>  That doesn't seem right.

Having the master jump in and start working could screw things up also
though.  Perhaps we need the master to start working as a fail-safe but
not plan on having things go that way?  Having more processes trying to
do X doesn't always result in things getting better and the master needs
to keep up with all the tuples being thrown at it from the workers.
Thanks,
    Stephen

pgsql-hackers by date:

From: Stephen Frost
Date: 11 January 2015, 10:27:28
Subject: Re: Parallel Seq Scan

From: Stephen Frost
Date: 11 January 2015, 11:09:35
Subject: Re: Parallel Seq Scan

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next