Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parallel Seq Scan
Date
Msg-id 17752.1422311968@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parallel Seq Scan  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 1/23/15 10:16 PM, Amit Kapila wrote:
>> Further, if we want to just get the benefit of parallel I/O, then
>> I think we can get that by parallelising partition scan where different
>> table partitions reside on different disk partitions, however that is
>> a matter of separate patch.

> I don't think we even have to go that far.

> My experience with Postgres is that it is *very* sensitive to IO latency (not bandwidth). I believe this is the case
becausecomplex queries tend to interleave CPU intensive code in-between IO requests. So we see this pattern:
 

> Wait 5ms on IO
> Compute for a few ms
> Wait 5ms on IO
> Compute for a few ms
> ...

> We blindly assume that the kernel will magically do read-ahead for us, but I've never seen that work so great. It
certainlyfalls apart on something like an index scan.
 

> If we could instead do this:

> Wait for first IO, issue second IO request
> Compute
> Already have second IO request, issue third
> ...

> We'd be a lot less sensitive to IO latency.

It would take about five minutes of coding to prove or disprove this:
stick a PrefetchBuffer call into heapgetpage() to launch a request for the
next page as soon as we've read the current one, and then see if that
makes any obvious performance difference.  I'm not convinced that it will,
but if it did then we could think about how to make it work for real.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: New CF app deployment
Next
From: Stephen Frost
Date:
Subject: Re: pg_upgrade and rsync