Re: How to improve db performance with $7K? - Mailing list pgsql-performance
From | Jacques Caron |
---|---|
Subject | Re: How to improve db performance with $7K? |
Date | |
Msg-id | 6.2.0.14.0.20050418183007.03d0ce18@pop.interactivemediafactory.net Whole thread Raw |
In response to | Re: How to improve db performance with $7K? (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: How to improve db performance with $7K?
|
List | pgsql-performance |
Hi, At 16:59 18/04/2005, Greg Stark wrote: >William Yu <wyu@talisys.com> writes: > > > Using the above prices for a fixed budget for RAID-10, you could get: > > > > SATA 7200 -- 680MB per $1000 > > SATA 10K -- 200MB per $1000 > > SCSI 10K -- 125MB per $1000 > >What a lot of these analyses miss is that cheaper == faster because cheaper >means you can buy more spindles for the same price. I'm assuming you picked >equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as >many spindles as the 125MB/$1000. That means it would have almost double the >bandwidth. And the 7200 RPM case would have more than 5x the bandwidth. > >While 10k RPM drives have lower seek times, and SCSI drives have a natural >seek time advantage, under load a RAID array with fewer spindles will start >hitting contention sooner which results into higher latency. If the controller >works well the larger SATA arrays above should be able to maintain their >mediocre latency much better under load than the SCSI array with fewer drives >would maintain its low latency response time despite its drives' lower average >seek time. I would definitely agree. More factors in favor of more cheap drives: - cheaper drives (7200 rpm) have larger disks (3.7" diameter against 2.6 or 3.3). That means the outer tracks hold more data, and the same amount of data is held on a smaller area, which means less tracks, which means reduced seek times. You can roughly count the real average seek time as (average seek time over full disk * size of dataset / capacity of disk). And you actually need to physicall seek less often too. - more disks means less data per disk, which means the data is further concentrated on outer tracks, which means even lower seek times Also, what counts is indeed not so much the time it takes to do one single random seek, but the number of random seeks you can do per second. Hence, more disks means more seeks per second (if requests are evenly distributed among all disks, which a good stripe size should achieve). Not taking into account TCQ/NCQ or write cache optimizations, the important parameter (random seeks per second) can be approximated as: N * 1000 / (lat + seek * ds / (N * cap)) Where: N is the number of disks lat is the average rotational latency in milliseconds (500/(rpm/60)) seek is the average seek over the full disk in milliseconds ds is the dataset size cap is the capacity of each disk Using this formula and a variety of disks, counting only the disks themselves (no enclosures, controllers, rack space, power, maintenance...), trying to maximize the number of seeks/second for a fixed budget (1000 euros) with a dataset size of 100 GB makes SATA drives clear winners: you can get more than 4000 seeks/second (with 21 x 80GB disks) where SCSI cannot even make it to the 1400 seek/second point (with 8 x 36 GB disks). Results can vary quite a lot based on the dataset size, which illustrates the importance of "staying on the edges" of the disks. I'll try to make the analysis more complete by counting some of the "overhead" (obviously 21 drives has a lot of other implications!), but I believe SATA drives still win in theory. It would be interesting to actually compare this to real-world (or nearly-real-world) benchmarks to measure the effectiveness of features like TCQ/NCQ etc. Jacques.
pgsql-performance by date: