Rosser Schwarz wrote:
> while you weren't looking, Kevin Brown wrote:
>
> [reordering bursty reads]
>
> > In other words, it's a corner case that I strongly suspect
> > isn't typical in situations where SCSI has historically made a big
> > difference.
>
> [...]
>
> > But I rather doubt that has to be a huge penalty, if any. When a
> > process issues an fsync (or even a sync), the kernel doesn't *have* to
> > drop everything it's doing and get to work on it immediately. It
> > could easily gather a few more requests, bundle them up, and then
> > issue them.
>
> To make sure I'm following you here, are you or are you not suggesting
> that the kernel could sit on -all- IO requests for some small handful
> of ms before actually performing any IO to address what you "strongly
> suspect" is a "corner case"?
The kernel *can* do so. Whether or not it's a good idea depends on
the activity in the system. You'd only consider doing this if you
didn't already have a relatively large backlog of I/O requests to
handle. You wouldn't do this for every I/O request.
Consider this: I/O operations to a block device are so slow compared
with the speed of other (non I/O) operations on the system that the
system can easily wait for, say, a hundredth of the typical latency on
the target device before issuing requests to it and not have any real
negative impact on the system's I/O throughput. A process running on
my test system, a 3 GHz Xeon, can issue a million read system calls
per second (I've measured it. I can post the rather trivial source
code if you're interested). That's the full round trip of issuing the
system call and having the kernel return back. That means that in the
span of a millisecond, the system could receive 1000 requests if the
system were busy enough. If the average latency for a random read
from the disk (including head movement and everything) is 10
milliseconds, and we decide to delay the issuance of the first I/O
request for a tenth of a millisecond (a hundredth of the latency),
then the system might receive 100 additional I/O requests, which it
could then put into the queue and sort by block address before issuing
the read request. As long as the system knows what the last block
that was requested from that physical device was, it can order the
requests properly and then begin issuing them. Since the latency on
the target device is so high, this is likely to be a rather big win
for overall throughput.
--
Kevin Brown kevin@sysexperts.com