Re: Parallel query execution - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Parallel query execution
Date
Msg-id 20130116133354.GG16126@tamriel.snowman.net
Whole thread Raw
In response to Re: Parallel query execution  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Parallel query execution
List pgsql-hackers
* Claudio Freire (klaussfreire@gmail.com) wrote:
> Well, there's the fault in your logic. It won't be as linear.

I really don't see how this has become so difficult to communicate.

It doesn't have to be linear.

We're currently doing massive amounts of parallel processing by hand
using partitioning, tablespaces, and client-side logic to split up the
jobs.  It's certainly *much* faster than doing it in a single thread.
It's also faster with 10 processes going than 5 (we've checked).  With
10 going, we've hit the FC fabric limit (and these are spinning disks in
the SAN, not SSDs).  I'm also sure it'd be much slower if all 10
processes were trying to read data through a single process that's
reading from the I/O system.  We've got some processes which essentially
end up doing that and we don't come anywhere near the total FC fabric
bandwidth when just scanning through the system because, at that point,
you do hit the limits of how fast the individual drive sets can provide
data.

To be clear- I'm not suggesting that we would parallelize a SeqScan node
and have the nodes above it be single-threaded.  As I said upthread- we
want to parallelize reading and processing the data coming in.  Perhaps
at some level that works out to not change how we actually *do* seqscans
at all and instead something higher in the plan tree just creates
multiple of them on independent threads, but it's still going to end up
being parallel I/O in the end.

I'm done with this thread for now- as brought up, we need to focus on
getting 9.3 out the door.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Materialized views WIP patch
Next
From: Stephen Frost
Date:
Subject: Re: Parallel query execution