Re: How to improve db performance with $7K? - Mailing list pgsql-performance

From Jacques Caron
Subject Re: How to improve db performance with $7K?
Date
Msg-id 6.2.0.14.0.20050418183007.03d0ce18@pop.interactivemediafactory.net
Whole thread Raw
In response to Re: How to improve db performance with $7K?  (Greg Stark <gsstark@mit.edu>)
Responses Re: How to improve db performance with $7K?
List pgsql-performance
Hi,

At 16:59 18/04/2005, Greg Stark wrote:

>William Yu <wyu@talisys.com> writes:
>
> > Using the above prices for a fixed budget for RAID-10, you could get:
> >
> > SATA 7200 -- 680MB per $1000
> > SATA 10K  -- 200MB per $1000
> > SCSI 10K  -- 125MB per $1000
>
>What a lot of these analyses miss is that cheaper == faster because cheaper
>means you can buy more spindles for the same price. I'm assuming you picked
>equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as
>many spindles as the 125MB/$1000. That means it would have almost double the
>bandwidth. And the 7200 RPM case would have more than 5x the bandwidth.
>
>While 10k RPM drives have lower seek times, and SCSI drives have a natural
>seek time advantage, under load a RAID array with fewer spindles will start
>hitting contention sooner which results into higher latency. If the controller
>works well the larger SATA arrays above should be able to maintain their
>mediocre latency much better under load than the SCSI array with fewer drives
>would maintain its low latency response time despite its drives' lower average
>seek time.

I would definitely agree. More factors in favor of more cheap drives:
- cheaper drives (7200 rpm) have larger disks (3.7" diameter against 2.6 or
3.3). That means the outer tracks hold more data, and the same amount of
data is held on a smaller area, which means less tracks, which means
reduced seek times. You can roughly count the real average seek time as
(average seek time over full disk * size of dataset / capacity of disk).
And you actually need to physicall seek less often too.

- more disks means less data per disk, which means the data is further
concentrated on outer tracks, which means even lower seek times

Also, what counts is indeed not so much the time it takes to do one single
random seek, but the number of random seeks you can do per second. Hence,
more disks means more seeks per second (if requests are evenly distributed
among all disks, which a good stripe size should achieve).

Not taking into account TCQ/NCQ or write cache optimizations, the important
parameter (random seeks per second) can be approximated as:

N * 1000 / (lat + seek * ds / (N * cap))

Where:
N is the number of disks
lat is the average rotational latency in milliseconds (500/(rpm/60))
seek is the average seek over the full disk in milliseconds
ds is the dataset size
cap is the capacity of each disk

Using this formula and a variety of disks, counting only the disks
themselves (no enclosures, controllers, rack space, power, maintenance...),
trying to maximize the number of seeks/second for a fixed budget (1000
euros) with a dataset size of 100 GB makes SATA drives clear winners: you
can get more than 4000 seeks/second (with 21 x 80GB disks) where SCSI
cannot even make it to the 1400 seek/second point (with 8 x 36 GB disks).
Results can vary quite a lot based on the dataset size, which illustrates
the importance of "staying on the edges" of the disks. I'll try to make the
analysis more complete by counting some of the "overhead" (obviously 21
drives has a lot of other implications!), but I believe SATA drives still
win in theory.

It would be interesting to actually compare this to real-world (or
nearly-real-world) benchmarks to measure the effectiveness of features like
TCQ/NCQ etc.

Jacques.




pgsql-performance by date:

Previous
From: John A Meinel
Date:
Subject: Re: How to improve db performance with $7K?
Next
From: Steve Poe
Date:
Subject: Re: How to improve db performance with $7K?