Re: How to improve db performance with $7K? - Mailing list pgsql-performance

From Kevin Brown
Subject Re: How to improve db performance with $7K?
Date
Msg-id 20050415050336.GE19518@filer
Whole thread Raw
In response to Re: How to improve db performance with $7K?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: How to improve db performance with $7K?
List pgsql-performance
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > Tom Lane wrote:
> >> The reason this is so much more of a win than it was when ATA was
> >> designed is that in modern drives the kernel has very little clue about
> >> the physical geometry of the disk.  Variable-size tracks, bad-block
> >> sparing, and stuff like that make for a very hard-to-predict mapping
> >> from linear sector addresses to actual disk locations.
>
> > What I mean is that when it comes to scheduling disk activity,
> > knowledge of the specific physical geometry of the disk isn't really
> > important.
>
> Oh?
>
> Yes, you can probably assume that blocks with far-apart numbers are
> going to require a big seek, and you might even be right in supposing
> that a block with an intermediate number should be read on the way.
> But you have no hope at all of making the right decisions at a more
> local level --- say, reading various sectors within the same cylinder
> in an optimal fashion.  You don't know where the track boundaries are,
> so you can't schedule in a way that minimizes rotational latency.

This is true, but has to be examined in the context of the workload.

If the workload is a sequential read, for instance, then the question
becomes whether or not giving the controller a set of sequential
blocks (in block ID order) will get you maximum read throughput.
Given that the manufacturers all attempt to generate the biggest read
throughput numbers, I think it's reasonable to assume that (a) the
sectors are ordered within a cylinder such that reading block x + 1
immediately after block x will incur the smallest possible amount of
delay if requested quickly enough, and (b) the same holds true when
block x + 1 is on the next cylinder.

In the case of pure random reads, you'll end up having to wait an
average of half of a rotation before beginning the read.  Where SCSI
buys you something here is when you have sequential chunks of reads
that are randomly distributed.  The SCSI drive can determine which
block in the set to start with first.  But for that to really be a big
win, the chunks themselves would have to span more than half a track
at least, else you'd have a greater than half a track gap in the
middle of your two sorted sector lists for that track (a really
well-engineered SCSI disk could take advantage of the fact that there
are multiple platters and fill the "gap" with reads from a different
platter).


Admittedly, this can be quite a big win.  With an average rotational
latency of 4 milliseconds on a 7200 RPM disk, being able to begin the
read at the earliest possible moment will shave at most 25% off the
total average random-access latency, if the average seek time is 12
milliseconds.

> That might be the case with respect to decisions about long seeks,
> but not with respect to rotational latency.  The kernel simply hasn't
> got the information.

True, but that should reduce the total latency by something like 17%
(on average).  Not trivial, to be sure, but not an order of magnitude,
either.


--
Kevin Brown                          kevin@sysexperts.com

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to improve db performance with $7K?
Next
From: Tom Lane
Date:
Subject: Re: How to improve db performance with $7K?