Re: How to improve db performance with $7K? - Mailing list pgsql-performance
From | Kevin Brown |
---|---|
Subject | Re: How to improve db performance with $7K? |
Date | |
Msg-id | 20050415050336.GE19518@filer Whole thread Raw |
In response to | Re: How to improve db performance with $7K? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: How to improve db performance with $7K?
|
List | pgsql-performance |
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > Tom Lane wrote: > >> The reason this is so much more of a win than it was when ATA was > >> designed is that in modern drives the kernel has very little clue about > >> the physical geometry of the disk. Variable-size tracks, bad-block > >> sparing, and stuff like that make for a very hard-to-predict mapping > >> from linear sector addresses to actual disk locations. > > > What I mean is that when it comes to scheduling disk activity, > > knowledge of the specific physical geometry of the disk isn't really > > important. > > Oh? > > Yes, you can probably assume that blocks with far-apart numbers are > going to require a big seek, and you might even be right in supposing > that a block with an intermediate number should be read on the way. > But you have no hope at all of making the right decisions at a more > local level --- say, reading various sectors within the same cylinder > in an optimal fashion. You don't know where the track boundaries are, > so you can't schedule in a way that minimizes rotational latency. This is true, but has to be examined in the context of the workload. If the workload is a sequential read, for instance, then the question becomes whether or not giving the controller a set of sequential blocks (in block ID order) will get you maximum read throughput. Given that the manufacturers all attempt to generate the biggest read throughput numbers, I think it's reasonable to assume that (a) the sectors are ordered within a cylinder such that reading block x + 1 immediately after block x will incur the smallest possible amount of delay if requested quickly enough, and (b) the same holds true when block x + 1 is on the next cylinder. In the case of pure random reads, you'll end up having to wait an average of half of a rotation before beginning the read. Where SCSI buys you something here is when you have sequential chunks of reads that are randomly distributed. The SCSI drive can determine which block in the set to start with first. But for that to really be a big win, the chunks themselves would have to span more than half a track at least, else you'd have a greater than half a track gap in the middle of your two sorted sector lists for that track (a really well-engineered SCSI disk could take advantage of the fact that there are multiple platters and fill the "gap" with reads from a different platter). Admittedly, this can be quite a big win. With an average rotational latency of 4 milliseconds on a 7200 RPM disk, being able to begin the read at the earliest possible moment will shave at most 25% off the total average random-access latency, if the average seek time is 12 milliseconds. > That might be the case with respect to decisions about long seeks, > but not with respect to rotational latency. The kernel simply hasn't > got the information. True, but that should reduce the total latency by something like 17% (on average). Not trivial, to be sure, but not an order of magnitude, either. -- Kevin Brown kevin@sysexperts.com
pgsql-performance by date: