Re: Effects of setting linux block device readahead size - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Effects of setting linux block device readahead size |
Date | |
Msg-id | Pine.GSO.4.64.0809101313070.4714@westnet.com Whole thread Raw |
In response to | Re: Effects of setting linux block device readahead size ("Scott Carey" <scott@richrelevance.com>) |
Responses |
Re: Effects of setting linux block device readahead size
Re: Effects of setting linux block device readahead size |
List | pgsql-performance |
On Wed, 10 Sep 2008, Scott Carey wrote: > How does that readahead tunable affect random reads or mixed random / > sequential situations? It still helps as long as you don't make the parameter giant. The read cache in a typical hard drive noawadays is 8-32MB. If you're seeking a lot, you still might as well read the next 1MB or so after the block requested once you've gone to the trouble of moving the disk somewhere. Seek-bound workloads will only waste a relatively small amount of the disk's read cache that way--the slow seek rate itself keeps that from polluting the buffer cache too fast with those reads--while sequential ones benefit enormously. If you look at Mark's tests, you can see approximately where the readahead is filling the disk's internal buffers, because what happens then is the sequential read performance improvement levels off. That looks near 8MB for the array he's tested, but I'd like to see a single disk to better feel that out. Basically, once you know that, you back off from there as much as you can without killing sequential performance completely and that point should still support a mixed workload. Disks are fairly well understood physical components, and if you think in those terms you can build a gross model easily enough: Average seek time: 4ms Seeks/second: 250 Data read/seek: 1MB (read-ahead number goes here) Total read bandwidth: 250MB/s Since that's around what a typical interface can support, that's why I suggest a 1MB read-ahead shouldn't hurt even seek-only workloads, and it's pretty close to optimal for sequential as well here (big improvement from the default Linux RA of 256 blocks=128K). If you know your work is biased heavily toward sequential scans, you might pick the 8MB read-ahead instead. That value (--setra=16384 -> 8MB) has actually been the standard "start here" setting 3ware suggests on Linux for a while now: http://www.3ware.com/kb/Article.aspx?id=11050 > I would be very interested in a mixed fio profile with a "background writer" > doing moderate, paced random and sequential writes combined with concurrent > sequential reads and random reads. Trying to make disk benchmarks really complicated is a path that leads to a lot of wasted time. I one made this gigantic design plan for something that worked like the PostgreSQL buffer management system to work as a disk benchmarking tool. I threw it away after confirming I could do better with carefully scripted pgbench tests. If you want to benchmark something that looks like a database workload, benchmark a database workload. That will always be better than guessing what such a workload acts like in a synthetic fashion. The "seeks/second" number bonnie++ spits out is good enough for most purposes at figuring out if you've detuned seeks badly. "pgbench -S" run against a giant database gives results that look a lot like seeks/second, and if you mix multiple custom -f tests together it will round-robin between them at random... It's really helpful to measure these various disk subsystem parameters individually. Knowing the sequential read/write, seeks/second, and commit rate for a disk setup is mainly valuable at making sure you're getting the full performance expected from what you've got. Like in this example, where something was obviously off on the single disk results because reads were significantly slower than writes. That's not supposed to happen, so you know something basic is wrong before you even get into RAID and such. Beyond confirming whether or not you're getting approximately what you should be out of the basic hardware, disk benchmarks are much less useful than application ones. With all that, I think I just gave away what the next conference paper I've been working on is about. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-performance by date: