On Mon, Aug 29, 2005 at 09:42:46AM -0400, Rémy Beaumont wrote:
>We have been trying to pinpoint what originally seem to be a I/O
>bottleneck but which now seems to be an issue with either Postgresql or
>RHES 3.
Nope, it's an IO bottleneck.
>The behavior we see is that when running queries that do random reads
>on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
>throughput bellow 3000kB/s
That's the sign of an IO bottleneck.
>The stats of the NetApp do confirm that it is sitting idle. Doing an
>strace on the Postgresql process shows that is it doing seeks and
>reads.
>
>So my question is where is this iowait time spent ?
Waiting for the seeks. postgres doesn't do async io, so it requests a
block, waits for it to come in, then requests another block, etc. The
utilization on the netapp isn't going to be high because it doesn't have
a queue of requests and can't do readahead because the IO is random. The
only way to improve the situation would be to reduce the latency of the
seeks. If I read the numbers right you're only getting about 130
seeks/s, which ain't great. I don't know how much latency the netapp
adds in the this configuration; have you tried benchmarking
direct-attach disks?
Mike Stone