Re: Sequential I/O Cost (was Re: A Better External Sort?) - Mailing list pgsql-performance

From Ron Peacetree
Subject Re: Sequential I/O Cost (was Re: A Better External Sort?)
Date
Msg-id 2944051.1127979774218.JavaMail.root@elwamui-polski.atl.sa.earthlink.net
Whole thread Raw
In response to Sequential I/O Cost (was Re: A Better External Sort?)  ("Jeffrey W. Baker" <jwbaker@acm.org>)
List pgsql-performance
>From: "Jeffrey W. Baker" <jwbaker@acm.org>
>Sent: Sep 29, 2005 12:33 AM
>Subject: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?)
>
>On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote:
>>>From: "Jeffrey W. Baker" <jwbaker@acm.org>
>>>Perhaps I believe this because you can now buy as much sequential I/O
>>>as you want.  Random I/O is the only real savings.
>>>
>> 1= No, you can not "buy as much sequential IO as you want".  Even if
>> with an infinite budget, there are physical and engineering limits.  Long
>> before you reach those limits, you will pay exponentially increasing costs
>> for linearly increasing performance gains.  So even if you _can_ buy a
>> certain level of sequential IO, it may not be the most efficient way to
>> spend money.
>
>This is just false.  You can buy sequential I/O for linear money up to
>and beyond your platform's main memory bandwidth.  Even 1GB/sec will
>severely tax memory bandwidth of mainstream platforms, and you can
>achieve this rate for a modest cost.
>
I don't think you can prove this statement.
A= www.pricewatch.com lists 7200rpm 320GB SATA II HDs for ~$160.
ASTR according to www.storagereview.com is ~50MBps.  Average access
time is ~12-13ms.
Absolute TOTL 15Krpm 147GB U320 or FC HDs cost ~4x as much per GB,
yet only deliver ~80-90MBps ASTR and average access times of
~5.5-6.0ms.
Your statement is clearly false in terms of atomic raw HD performance.

B= low end RAID controllers can be obtained for a few $100's.  But even
amongst them, a $600+ card does not perform 3-6x better than a
$100-$200 card.  When the low end HW is not enough, the next step in
price is to ~$10K+ (ie Xyratex), and the ones after that are to ~$100K+
(ie NetApps) and ~$1M+ (ie EMC, IBM, etc).  None of these ~10x steps
in price results in a ~10x increase in performance.
Your statement is clearly false in terms of HW based RAID performance.

C= A commodity AMD64 mainboard with a dual channel DDR PC3200
RAM subsystem has 6.4GBps of bandwidth.  These are as common
as weeds and almost as cheap:  www.pricewatch.com
Your statement about commodity systems main memory bandwidth
being "severely taxed at 1GBps" is clearly false.

D= Xyratecs makes RAID HW for NetApps and EMC.  NONE of their
current HW can deliver 1GBps.  More like 600-700MBps.  Engino and
Dot Hill have similar limitations on their current products.  No PCI or
PCI-X based HW could ever do more than ~800-850MBps since
that's the RW limit of those busses.  Next Gen products are likely to
2x those limits and cross the 1GBps barrier based on ~90MBps SAS
or FC HD's and PCI-Ex8 (2GBps max) and PCI-Ex16 (4GBps max).
Note that not even next gen or 2 gens from now RAID HW will be
able to match the memory bandwidth of the current commodity
memory subsystem mentioned in "C" above.
Your statement that one can achieve a HD IO rate that will tax RAM
bandwidth at modest cost is clearly false.

QED Your statement is false on all counts and in all respects.


>I have one array that can supply this rate and it has only 15 disks.  It
>would fit on my desk.  I think your dire talk about the limits of
>science and engineering may be a tad overblown.
>
Name it and post its BOM, configuration specs, price and ordering
information.  Then tell us what it's plugged into and all the same
details on _that_.

If all 15 HD's are being used for one RAID set, then you can't be
using RAID 10, which means any claims re: write performance in
particular should be closely examined.

A 15 volume RAID 5 made of the fastest 15Krpm U320 or FC HDs,
each with ~85.9MBps ASTR, could in theory do ~14*85.9=
~1.2GBps raw ASTR for at least reads, but no one I know of makes
commodity RAID HW that can keep up with this, nor can any one
PCI-X bus support it even if such commodity RAID HW did exist.

Hmmm.  SW RAID on at least a PCI-Ex8 bus might be able to do it if
we can multiplex enough 4Gbps FC lines (4Gbps= 400MBps => max
of 4 of the above HDs per line and 4 FC lines) with low enough latency
and have enough CPU driving it...Won't be easy nor cheap though.

pgsql-performance by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: Comparative performance
Next
From: PFC
Date:
Subject: Re: Comparative performance