Re: Hardware/OS recommendations for large databases ( - Mailing list pgsql-performance
From | Alan Stange |
---|---|
Subject | Re: Hardware/OS recommendations for large databases ( |
Date | |
Msg-id | 4381E077.80009@rentec.com Whole thread Raw |
In response to | Re: Hardware/OS recommendations for large databases ( ("Luke Lonergan" <llonergan@greenplum.com>) |
Responses |
Re: Hardware/OS recommendations for large databases (
Re: Hardware/OS recommendations for large databases ( |
List | pgsql-performance |
Luke Lonergan wrote: > OK - slower this time: > We've seen between 110MB/s and 120MB/s on a wide variety of fast CPU > machines with fast I/O subsystems that can sustain 250MB/s+ using dd, but > which all are capped at 120MB/s when doing sequential scans with different > versions of Postgres. > Postgresql issues the exact same sequence of read() calls as does dd. So why is dd so much faster? I'd be careful with the dd read of a 16GB file on an 8GB system. Make sure you umount the file system first, to make sure all of the file is flushed from memory. Some systems use a freebehind on sequential reads to avoid flushing memory...and you'd find that 1/2 of your 16GB file is still in memory. The same point also holds for the writes: when dd finishes not all the data is on disk. You need to issue a sync() call to make that happen. Use lmdd to ensure that the data is actually all written. In other words, I think your dd results are possibly misleading. It's trivial to demonstrate: $ time dd if=/dev/zero of=/fidb1/bigfile bs=8k count=800000 800000+0 records in 800000+0 records out real 0m13.780s user 0m0.134s sys 0m13.510s Oops. I just wrote 470MB/s to a file system that has peak write speed of 200MB/s peak. Now, you might say that you wrote a 16GB file on an 8 GB machine so this isn't an issue. It does make your dd numbers look fast as some of the data will be unwritten. I'd also suggest running dd on the same files as postgresql. I suspect you'd find that the layout of the postgresql files isn't that good as they are grown bit by bit, unlike the file created by simply dd'ing a large file. > Understand my point: It doesn't matter that there is idle or iowait on the > CPU, the postgres executor is not able to drive the I/O rate for two > reasons: there is a lot of CPU used for the scan (the 40% you reported) and > a lack of asynchrony (the iowait time). That means that by speeding up the > CPU you only reduce the first part, but you don't fix the second and v.v. > > With more aggressive readahead, the second problem (the I/O asynchrony) is > handled better by the Linux kernel and filesystem. That's what we're seeing > with XFS. I think your point doesn't hold up. Every time you make it, I come away posting another result showing it to be incorrect. The point your making doesn't match my experience with *any* storage or program I've ever used, including postgresql. Your point suggests that the storage system is idle and that postgresql is broken because it isn't able to use the resources available...even when the cpu is very idle. How can that make sense? The issue here is that the storage system is very active doing reads on the files...which might be somewhat poorly allocated on disk because postgresql grows the tables bit by bit. I had the same readahead in Reiser and in XFS. The XFS performance was better because XFS does a better job of large file allocation on disk, thus resulting in many fewer seeks (generated by the file system itself) to read the files back in. As an example, some file systems like UFS purposely scatter large files across cylinder groups to avoid forcing large seeks on small files; one can tune this behavior so that large files are more tightly allocated. Of course, because this is engineering, I have another obligatory data point: This time it's a 4.2GB table using 137,138 32KB pages with nearly 41 million rows. A "select count(1)" on the table completes in 14.6 seconds, for an average read rate of 320 MB/s. One cpu was idle, the other averaged 32% system time and 68 user time for the 14 second period. This is on a 2.2Ghz Opteron. A faster cpu would show increased performance as I really am cpu bound finally. Postgresql is clearly able to issue the relevant sequential read() system calls and sink the resulting data without a problem if the file system is capable of providing the data. It can do this up to a speed of ~300MB/s on this class of system. Now it should be fairly simple to tweak the few spots where some excess memory copies are being done and up this result substantially. I hope postgresql is always using the libc memcpy as that's going to be a lot faster then some private routine. -- Alan
pgsql-performance by date: