Re: How to improve db performance with $7K? - Mailing list pgsql-performance

From Kevin Brown
Subject Re: How to improve db performance with $7K?
Date
Msg-id 20050415020337.GD19518@filer
Whole thread Raw
In response to Re: How to improve db performance with $7K?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: How to improve db performance with $7K?
Re: How to improve db performance with $7K?
Re: How to improve db performance with $7K?
List pgsql-performance
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > I really don't see how this is any different between a system that has
> > tagged queueing to the disks and one that doesn't.  The only
> > difference is where the queueing happens.  In the case of SCSI, the
> > queueing happens on the disks (or at least on the controller).  In the
> > case of SATA, the queueing happens in the kernel.
>
> That's basically what it comes down to: SCSI lets the disk drive itself
> do the low-level I/O scheduling whereas the ATA spec prevents the drive
> from doing so (unless it cheats, ie, caches writes).  Also, in SCSI it's
> possible for the drive to rearrange reads as well as writes --- which
> AFAICS is just not possible in ATA.  (Maybe in the newest spec...)
>
> The reason this is so much more of a win than it was when ATA was
> designed is that in modern drives the kernel has very little clue about
> the physical geometry of the disk.  Variable-size tracks, bad-block
> sparing, and stuff like that make for a very hard-to-predict mapping
> from linear sector addresses to actual disk locations.

Yeah, but it's not clear to me, at least, that this is a first-order
consideration.  A second-order consideration, sure, I'll grant that.

What I mean is that when it comes to scheduling disk activity,
knowledge of the specific physical geometry of the disk isn't really
important.  What's important is whether or not the disk conforms to a
certain set of expectations.  Namely, that the general organization is
such that addressing the blocks in block number order guarantees
maximum throughput.

Now, bad block remapping destroys that guarantee, but unless you've
got a LOT of bad blocks, it shouldn't destroy your performance, right?

> Combine that with the fact that the drive controller can be much
> smarter than it was twenty years ago, and you can see that the case
> for doing I/O scheduling in the kernel and not in the drive is
> pretty weak.

Well, I certainly grant that allowing the controller to do the I/O
scheduling is faster than having the kernel do it, as long as it can
handle insertion of new requests into the list while it's in the
middle of executing a request.  The most obvious case is when the head
is in motion and the new request can be satisfied by reading from the
media between where the head is at the time of the new request and
where the head is being moved to.

My argument is that a sufficiently smart kernel scheduler *should*
yield performance results that are reasonably close to what you can
get with that feature.  Perhaps not quite as good, but reasonably
close.  It shouldn't be an orders-of-magnitude type difference.



--
Kevin Brown                          kevin@sysexperts.com

pgsql-performance by date:

Previous
From: Geoffrey
Date:
Subject: Re: Intel SRCS16 SATA raid?
Next
From: Alex Turner
Date:
Subject: Re: How to improve db performance with $7K?