Re: Sequential I/O Cost (was Re: A Better External Sort?) - Mailing list pgsql-performance
From | Ron Peacetree |
---|---|
Subject | Re: Sequential I/O Cost (was Re: A Better External Sort?) |
Date | |
Msg-id | 2944051.1127979774218.JavaMail.root@elwamui-polski.atl.sa.earthlink.net Whole thread Raw |
In response to | Sequential I/O Cost (was Re: A Better External Sort?) ("Jeffrey W. Baker" <jwbaker@acm.org>) |
List | pgsql-performance |
>From: "Jeffrey W. Baker" <jwbaker@acm.org> >Sent: Sep 29, 2005 12:33 AM >Subject: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?) > >On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote: >>>From: "Jeffrey W. Baker" <jwbaker@acm.org> >>>Perhaps I believe this because you can now buy as much sequential I/O >>>as you want. Random I/O is the only real savings. >>> >> 1= No, you can not "buy as much sequential IO as you want". Even if >> with an infinite budget, there are physical and engineering limits. Long >> before you reach those limits, you will pay exponentially increasing costs >> for linearly increasing performance gains. So even if you _can_ buy a >> certain level of sequential IO, it may not be the most efficient way to >> spend money. > >This is just false. You can buy sequential I/O for linear money up to >and beyond your platform's main memory bandwidth. Even 1GB/sec will >severely tax memory bandwidth of mainstream platforms, and you can >achieve this rate for a modest cost. > I don't think you can prove this statement. A= www.pricewatch.com lists 7200rpm 320GB SATA II HDs for ~$160. ASTR according to www.storagereview.com is ~50MBps. Average access time is ~12-13ms. Absolute TOTL 15Krpm 147GB U320 or FC HDs cost ~4x as much per GB, yet only deliver ~80-90MBps ASTR and average access times of ~5.5-6.0ms. Your statement is clearly false in terms of atomic raw HD performance. B= low end RAID controllers can be obtained for a few $100's. But even amongst them, a $600+ card does not perform 3-6x better than a $100-$200 card. When the low end HW is not enough, the next step in price is to ~$10K+ (ie Xyratex), and the ones after that are to ~$100K+ (ie NetApps) and ~$1M+ (ie EMC, IBM, etc). None of these ~10x steps in price results in a ~10x increase in performance. Your statement is clearly false in terms of HW based RAID performance. C= A commodity AMD64 mainboard with a dual channel DDR PC3200 RAM subsystem has 6.4GBps of bandwidth. These are as common as weeds and almost as cheap: www.pricewatch.com Your statement about commodity systems main memory bandwidth being "severely taxed at 1GBps" is clearly false. D= Xyratecs makes RAID HW for NetApps and EMC. NONE of their current HW can deliver 1GBps. More like 600-700MBps. Engino and Dot Hill have similar limitations on their current products. No PCI or PCI-X based HW could ever do more than ~800-850MBps since that's the RW limit of those busses. Next Gen products are likely to 2x those limits and cross the 1GBps barrier based on ~90MBps SAS or FC HD's and PCI-Ex8 (2GBps max) and PCI-Ex16 (4GBps max). Note that not even next gen or 2 gens from now RAID HW will be able to match the memory bandwidth of the current commodity memory subsystem mentioned in "C" above. Your statement that one can achieve a HD IO rate that will tax RAM bandwidth at modest cost is clearly false. QED Your statement is false on all counts and in all respects. >I have one array that can supply this rate and it has only 15 disks. It >would fit on my desk. I think your dire talk about the limits of >science and engineering may be a tad overblown. > Name it and post its BOM, configuration specs, price and ordering information. Then tell us what it's plugged into and all the same details on _that_. If all 15 HD's are being used for one RAID set, then you can't be using RAID 10, which means any claims re: write performance in particular should be closely examined. A 15 volume RAID 5 made of the fastest 15Krpm U320 or FC HDs, each with ~85.9MBps ASTR, could in theory do ~14*85.9= ~1.2GBps raw ASTR for at least reads, but no one I know of makes commodity RAID HW that can keep up with this, nor can any one PCI-X bus support it even if such commodity RAID HW did exist. Hmmm. SW RAID on at least a PCI-Ex8 bus might be able to do it if we can multiplex enough 4Gbps FC lines (4Gbps= 400MBps => max of 4 of the above HDs per line and 4 FC lines) with low enough latency and have enough CPU driving it...Won't be easy nor cheap though.
pgsql-performance by date: