Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Parallel Seq Scan
Date
Msg-id 54C822A4.7040106@BlueTreble.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 1/26/15 11:11 PM, Amit Kapila wrote:
> On Tue, Jan 27, 2015 at 3:18 AM, Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>> wrote:
>  >
>  > On 1/23/15 10:16 PM, Amit Kapila wrote:
>  >>
>  >> Further, if we want to just get the benefit of parallel I/O, then
>  >> I think we can get that by parallelising partition scan where different
>  >> table partitions reside on different disk partitions, however that is
>  >> a matter of separate patch.
>  >
>  >
>  > I don't think we even have to go that far.
>  >
>  >
>  > We'd be a lot less sensitive to IO latency.
>  >
>  > I wonder what kind of gains we would see if every SeqScan in a query spawned a worker just to read tuples and
shovethem in a queue (or shove a pointer to a buffer in the queue).
 
>  >
>
> Here IIUC, you want to say that just get the read done by one parallel
> worker and then all expression calculation (evaluation of qualification
> and target list) in the main backend, it seems to me that by doing it
> that way, the benefit of parallelisation will be lost due to tuple
> communication overhead (may be the overhead is less if we just
> pass a pointer to buffer but that will have another kind of problems
> like holding buffer pins for a longer period of time).
>
> I could see the advantage of testing on lines as suggested by Tom Lane,
> but that seems to be not directly related to what we want to achieve by
> this patch (parallel seq scan) or if you think otherwise then let me know?

There's some low-hanging fruit when it comes to improving our IO performance (or more specifically, decreasing our
sensitivityto IO latency). Perhaps the way to do that is with the parallel infrastructure, perhaps not. But I think
it'spremature to look at parallelism for increasing IO performance, or worrying about things like how many IO threads
weshould have before we at least look at simpler things we could do. We shouldn't assume there's nothing to be gained
shortof a full parallelization implementation.
 

That's not to say there's nothing else we could use parallelism for. Sort, merge and hash operations come to mind.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: jsonb, unicode escapes and escaped backslashes
Next
From: Jim Nasby
Date:
Subject: Re: Parallel Seq Scan