Re: Hardware/OS recommendations for large databases ( - Mailing list pgsql-performance
From | Alan Stange |
---|---|
Subject | Re: Hardware/OS recommendations for large databases ( |
Date | |
Msg-id | 437FE2E4.5040600@rentec.com Whole thread Raw |
In response to | Re: Hardware/OS recommendations for large databases ( ("Luke Lonergan" <llonergan@greenplum.com>) |
Responses |
Re: Hardware/OS recommendations for large databases (
Re: Hardware/OS recommendations for large databases ( |
List | pgsql-performance |
Luke Lonergan wrote: > Alan, > > On 11/18/05 11:39 AM, "Alan Stange" <stange@rentec.com> wrote: > > >> Yes and no. The one cpu is clearly idle. The second cpu is 40% busy >> and 60% idle (aka iowait in the above numbers). >> > > The "aka iowait" is the problem here - iowait is not idle (otherwise it > would be in the "idle" column). > > Iowait is time spent waiting on blocking io calls. As another poster > pointed out, you have a two CPU system, and during your scan, as predicted, > one CPU went 100% busy on the seq scan. During iowait periods, the CPU can > be context switched to other users, but as I pointed out earlier, that's not > useful for getting response on decision support queries. > iowait time is idle time. Period. This point has been debated endlessly for Solaris and other OS's as well. Here's the man page: %iowait Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. If the system had some other cpu bound work to perform you wouldn't ever see any iowait time. Anyone claiming the cpu was 100% busy on the sequential scan using the one set of numbers I posted is misunderstanding the actual metrics. > Thanks for your data, it exemplifies many of the points brought up: > - Lots of disks and expensive I/O hardware does not help improve performance > on large table queries because I/O bandwidth does not scale beyond > 110-120MB/s on the fastest CPUs > I don't think that is the conclusion from anecdotal numbers I posted. This file subsystem doesn't perform as well as expected for any tool. Bonnie, dd, star, etc., don't get a better data rate either. In fact, the storage system wasn't built for performance; it was build to reliably hold a big chunk of data. Even so, postgresql is reading at 130MB/s on it, using about 30% of a single cpu, almost all of which was system time. I would get the same 130MB/s on a system with cpus that were substantially slower; the limitation isn't the cpus, or postgresql. It's the IO system that is poorly configured for this test, not postgresqls ability to use it. In fact, given the numbers I posted, it's clear this system could handily generate more than 120 MB/s using a single cpu given a better IO subsystem; it has cpu time to spare. A simple test can be done: build the database in /dev/shm and time the scans. It's the same read() system call being used and now one has made the IO system "infinitely fast". The claim is being made that standard postgresql is unable to generate more than 120MB/s of IO on any IO system due to an inefficient use of the kernel API and excessive memory copies, etc. Having the database be on a ram based file system is an example of "expensive IO hardware" and all else would be the same. Hmmm, now that I think about this, I could throw a medium sized table onto /dev/shm using tablespaces on one of our 8GB linux boxes. So why is this experiment not valid, or what is it about the above assertion that I am missing? Anyway, if one cares about high speed sequential IO, then one should use a much larger block size to start. Using 8KB IOs is inappropriate for such a configuration. We happen to be using 32KB blocks on our largest database and it's been the best move for us. -- Alan
pgsql-performance by date: