Re: Hash partitioning. - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Hash partitioning.
Date
Msg-id CAMkU=1wsR3DvvsxmAsoiQAHGnW+_UFETcQsig-MP6JL9cK098Q@mail.gmail.com
Whole thread Raw
In response to Re: Hash partitioning.  (Markus Wanner <markus@bluegap.ch>)
Responses Re: Hash partitioning.
List pgsql-hackers
On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner <markus@bluegap.ch> wrote:
On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> We could also allow a large query to search a single table in parallel.
> A seqscan would be easy to divide into N equally-sized parts that can be
> scanned in parallel. It's more difficult for index scans, but even then
> it might be possible at least in some limited cases.

So far reading sequentially is still faster than hopping between
different locations. Purely from the I/O perspective, that is.


Wouldn't any IO system being used on a high-end system be fairly good about making this work through interleaved read-ahead algorithms?  Also, hopefully the planner would be able to predict when parallelization has nothing to add and avoid using it, although surely that is easier said than done.
 

For queries where the single CPU core turns into a bottle-neck and which
we want to parallelize, we should ideally still do a normal, fully
sequential scan and only fan out after the scan and distribute the
incoming pages (or even tuples) to the multiple cores to process.

That sounds like it would be much more susceptible to lock contention, and harder to get bug-free, than dividing into bigger chunks, like whole 1 gig segments.  

Fanning out line by line (according to line_number % number_processes) was my favorite parallelization method in Perl, but those files were read only and so had no concurrency issues.
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Next
From: Jeff Janes
Date:
Subject: Re: Hash partitioning.