Re: Hardware/OS recommendations for large databases ( - Mailing list pgsql-performance
From | Alan Stange |
---|---|
Subject | Re: Hardware/OS recommendations for large databases ( |
Date | |
Msg-id | 438241E5.2010701@rentec.com Whole thread Raw |
In response to | Re: Hardware/OS recommendations for large databases ( ("Luke Lonergan" <llonergan@greenplum.com>) |
Responses |
Re: Hardware/OS recommendations for large databases (
Re: Hardware/OS recommendations for large databases ( |
List | pgsql-performance |
Luke, it's time to back yourself up with some numbers. You're claiming the need for a significant rewrite of portions of postgresql and you haven't done the work to make that case. You've apparently made some mistakes on the use of dd to benchmark a storage system. Use lmdd and umount the file system before the read and post your results. Using a file 2x the size of memory doesn't work corectly. You can quote any other numbers you want, but until you use lmdd correctly you should be ignored. Ideally, since postgresql uses 1GB files, you'll want to use 1GB files for dd as well. Luke Lonergan wrote: > Alan, > > On 11/21/05 6:57 AM, "Alan Stange" <stange@rentec.com> wrote: > > >> $ time dd if=/dev/zero of=/fidb1/bigfile bs=8k count=800000 >> 800000+0 records in >> 800000+0 records out >> >> real 0m13.780s >> user 0m0.134s >> sys 0m13.510s >> >> Oops. I just wrote 470MB/s to a file system that has peak write speed >> of 200MB/s peak. >> > How much RAM on this machine? > Doesn't matter. The result will always be wrong without a call to sync() or fsync() before the close() if you're trying to measure the speed of the disk subsystem. Add that sync() and the result will be correct for any memory size. Just for completeness: Solaris implicitly calls sync() as part of close. Bonnie used to get this wrong, so quoting Bonnie isn't any good. Note that on some systems using 2x memory for these tests is almost OK. For example, Solaris used to have a hiwater mark that would throttle processes and not allow more than a few 100K of writes to be outstanding on a file. Linux/XFS clearly allows a lot of write data to be outstanding. It's best to understand the tools and know what they do and why they can be wrong than simply quoting some other tool that makes the same mistakes. I find that postgresql is able to achieve about 175MB/s on average from a system capable of delivering 200MB/s peak and it does this with a lot of cpu time to spare. Maybe dd can do a little better and deliver 185MB/s. If I were to double the speed of my IO system, I might find that a single postgresql instance can sink about 300MB/s of data (based on the last numbers I posted). That's why I have multi-cpu opterons and more than one query/client as they soak up the remaining IO capacity. It is guaranteed that postgresql will hit some threshold of performance in the future and possible rewrites of some core functionality will be needed, but no numbers posted here so far have made the case that postgresql is in trouble now. In the mean time, build balanced systems with cpus that match the capabilities of the storage subsystems, use 32KB block sizes for large memory databases that are doing lots of sequential scans, use file systems tuned for large files, use opterons, etc. As always, one has to post some numbers. Here's an example of how dd doesn't do what you might expect: mite02:~ # lmdd if=internal of=/fidb2/bigfile bs=8k count=2k 16.7772 MB in 0.0235 secs, 714.5931 MB/sec mite02:~ # lmdd if=internal of=/fidb2/bigfile bs=8k count=2k sync=1 16.7772 MB in 0.1410 secs, 118.9696 MB/sec Both numbers are "correct". But one measures the kernels ability to absorb 2000 8KB writes with no guarantee that the data is on disk and the second measures the disk subsystems ability to write 16MB of data. dd is equivalent to the first result. You can't use the first type of result and complain that postgresql is slow. If you wrote 16G of data on a machine with 8G memory then your dd result is possibly too fast by a factor of two as 8G of the data might not be on disk yet. We won't know until you post some results. Cheers, -- Alan
pgsql-performance by date: