Re: How to improve db performance with $7K? - Mailing list pgsql-performance

From Tom Lane
Subject Re: How to improve db performance with $7K?
Date
Msg-id 28523.1113532916@sss.pgh.pa.us
Whole thread Raw
In response to Re: How to improve db performance with $7K?  (Kevin Brown <kevin@sysexperts.com>)
Responses Re: How to improve db performance with $7K?
Re: How to improve db performance with $7K?
List pgsql-performance
Kevin Brown <kevin@sysexperts.com> writes:
> Tom Lane wrote:
>> The reason this is so much more of a win than it was when ATA was
>> designed is that in modern drives the kernel has very little clue about
>> the physical geometry of the disk.  Variable-size tracks, bad-block
>> sparing, and stuff like that make for a very hard-to-predict mapping
>> from linear sector addresses to actual disk locations.

> What I mean is that when it comes to scheduling disk activity,
> knowledge of the specific physical geometry of the disk isn't really
> important.

Oh?

Yes, you can probably assume that blocks with far-apart numbers are
going to require a big seek, and you might even be right in supposing
that a block with an intermediate number should be read on the way.
But you have no hope at all of making the right decisions at a more
local level --- say, reading various sectors within the same cylinder
in an optimal fashion.  You don't know where the track boundaries are,
so you can't schedule in a way that minimizes rotational latency.
You're best off to throw all the requests at the drive together and
let the drive sort it out.

This is not to say that there's not a place for a kernel-side scheduler
too.  The drive will probably have a fairly limited number of slots in
its command queue.  The optimal thing is for those slots to be filled
with requests that are in the same area of the disk.  So you can still
get some mileage out of an elevator algorithm that works on logical
block numbers to give the drive requests for nearby block numbers at the
same time.  But there's also a lot of use in letting the drive do its
own low-level scheduling.

> My argument is that a sufficiently smart kernel scheduler *should*
> yield performance results that are reasonably close to what you can
> get with that feature.  Perhaps not quite as good, but reasonably
> close.  It shouldn't be an orders-of-magnitude type difference.

That might be the case with respect to decisions about long seeks,
but not with respect to rotational latency.  The kernel simply hasn't
got the information.

            regards, tom lane

pgsql-performance by date:

Previous
From: Alex Turner
Date:
Subject: Re: How to improve db performance with $7K?
Next
From: Kevin Brown
Date:
Subject: Re: How to improve db performance with $7K?