Re: RAID arrays and performance - Mailing list pgsql-performance

From Gregory Stark
Subject Re: RAID arrays and performance
Date
Msg-id 87fxyiik94.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: RAID arrays and performance  (Mark Mielke <mark@mark.mielke.cc>)
Responses Re: RAID arrays and performance
List pgsql-performance
"Mark Mielke" <mark@mark.mielke.cc> writes:

> Matthew wrote:
>
>> I don't think you would have to create a more intelligent table scanning
>> algorithm. What you would need to do is take the results of the index,
>> convert that to a list of page fetches, then pass that list to the OS as
>> an asynchronous "please fetch all these into the buffer cache" request,
>> then do the normal algorithm as is currently done. The requests would then
>> come out of the cache instead of from the disc. Of course, this is from a
>> simple Java programmer who doesn't know the OS interfaces for this sort of
>> thing.
>
> That's about how the talk went. :-)
>
> The problem is that a 12X speed for 12 disks seems unlikely except under very
> specific loads (such as a sequential scan of a single table). Each of the
> indexes may need to be scanned or searched in turn, then each of the tables
> would need to be scanned or searched in turn, depending on the query plan.
> There is no guarantee that the index rows or the table rows are equally spread
> across the 12 disks. CPU processing becomes involved with is currently limited
> to a single processor thread. I suspect no database would achieve a 12X speedup
> for 12 disks unless a simple sequential scan of a single table was required, in
> which case the reads could be fully parallelized with RAID 0 using standard
> sequential reads, and this is available today using built-in OS or disk
> read-ahead.

I'm sure you would get something between 1x and 12x though...

I'm rerunning my synthetic readahead tests now. That doesn't show the effect
of the other cpu and i/o work being done in the meantime but surely if they're
being evicted from cache too soon that just means your machine is starved for
cache and you should add more RAM?

Also, it's true, you need to preread more than 12 blocks to handle a 12-disk
raid. My offhand combinatorics analysis seems to indicate you would expect to
need to n(n-1)/2 blocks on average before you've hit all the blocks. There's
little penalty to prereading unless you use up too much kernel resources or
you do unnecessary i/o which you never use, so I would expect doing n^2 capped
at some reasonable number like 1,000 pages (enough to handle a 32-disk raid)
would be reasonable.

The real trick is avoiding doing prefetches that are never needed. The user
may never actually read all the tuples being requested. I think that means we
shouldn't prefetch until the second tuple is read and then gradually increase
the prefetch distance as you read more and more of the results.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

pgsql-performance by date:

Previous
From: Mark Mielke
Date:
Subject: Re: RAID arrays and performance
Next
From: Matthew
Date:
Subject: Re: RAID arrays and performance