Thread: Linux I/O schedulers - CFQ & random seeks
Hi Guys, I'm in the process of setting up some new hardware and am just doing some basic disk performance testing with bonnie++ tostart with. I'm seeing a massive difference on the random seeks test, with CFQ not performing very well as far as I can see. The thingis I didn't see this sort of massive divide when doing tests with our current hardware. Current hardware: 2x4core E5420 @2.5Ghz/ 32GB RAM/ Adaptec 5805Z w' 512Mb/ Raid 10/ 8 15k 3.5 Disks New hardware: 4x8core X7550 @2.0Ghz/ 128GB RAM/ H700 w' 1GB/ Raid 10/ 12 15.2k 2.5 Disks Admittedly, my testing on our current hardware was on 2.6.26 and on the new hardware it's on 2.6.32 - I think I'm going tohave to check the current hardware on the older kernel too. I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers? I'm going to have to do somereal world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back to deadline. Any opinions would be appreciated. Regardless, here are some sample results from the new hardware: CFQ: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Way5ax 258376M 666 99 434709 96 225498 35 2840 69 952115 76 556.2 3 Latency 12344us 619ms 522ms 255ms 425ms 529ms Version 1.96 ------Sequential Create------ --------Random Create-------- Way5ax -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 28808 41 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 6170us 594us 633us 7619us 20us 36us 1.96,1.96,Way5ax,1,1299173113,258376M,,666,99,434709,96,225498,35,2840,69,952115,76,556.2,3,16,,,,,28808,41,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,12344us,619ms,522ms,255ms,425ms,529ms,6170us,594us,633us,7619us,20us,36us deadline: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Way5ax 258376M 696 99 449914 96 287010 47 2952 69 989527 78 2304 19 Latency 11939us 856ms 570ms 174ms 228ms 24744us Version 1.96 ------Sequential Create------ --------Random Create-------- Way5ax -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 31338 45 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 5605us 605us 627us 6590us 19us 38us 1.96,1.96,Way5ax,1,1299237441,258376M,,696,99,449914,96,287010,47,2952,69,989527,78,2304,19,16,,,,,31338,45,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,11939us,856ms,570ms,174ms,228ms,24744us,5605us,605us,627us,6590us,19us,38us no-op: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Way5ax 258376M 706 99 451578 95 303351 49 4104 96 1003688 78 2294 19 Latency 11538us 530ms 1460ms 12141us 350ms 22969us Version 1.96 ------Sequential Create------ --------Random Create-------- Way5ax -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 31137 44 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 5918us 597us 627us 5039us 17us 36us 1.96,1.96,Way5ax,1,1299245225,258376M,,706,99,451578,95,303351,49,4104,96,1003688,78,2294,19,16,,,,,31137,44,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,11538us,530ms,1460ms,12141us,350ms,22969us,5918us,597us,627us,5039us,17us,36us -- Glyn
On 03/04/11 10:34, Glyn Astill wrote: > I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers? When testing our new DB box just last month, we saw a big improvement in bonnie++ random I/O rates when using the noop scheduler instead of cfq (or any other). We've got RAID 10/12 on a 3ware card w/ battery-backed cache; 7200rpm drives. Our file system is XFS with noatime,nobarrier,logbufs=8,logbsize=256k. How much is "big?" I can't find my notes for it, but I recall that the difference was large enough to surprise us. We're running with noop in production right now. No complaints.
On 3/4/11 11:03 AM, Wayne Conrad wrote: > On 03/04/11 10:34, Glyn Astill wrote: > > I'm wondering (and this may be a can of worms) what peoples opinions > are on these schedulers? > > When testing our new DB box just last month, we saw a big improvement > in bonnie++ random I/O rates when using the noop scheduler instead of > cfq (or any other). We've got RAID 10/12 on a 3ware card w/ > battery-backed cache; 7200rpm drives. Our file system is XFS with > noatime,nobarrier,logbufs=8,logbsize=256k. How much is "big?" I > can't find my notes for it, but I recall that the difference was large > enough to surprise us. We're running with noop in production right > now. No complaints. > Just another anecdote, I found that the deadline scheduler performed the best for me. I don't have the benchmarks anymore but deadline vs cfq was dramatically faster for my tests. I posted this to the list years ago and others announced similar experiences. Noop was a close 2nd to deadline. XFS (noatime,nodiratime,nobarrier,logbufs=8) 391GB db cluster directory BBU Caching RAID10 12-disk SAS 128GB RAM Constant insert stream OLAP-ish query patterns Heavy random I/O
On Fri, Mar 4, 2011 at 11:39 AM, Dan Harris <fbsd@drivefaster.net> wrote: > Just another anecdote, I found that the deadline scheduler performed the > best for me. I don't have the benchmarks anymore but deadline vs cfq was > dramatically faster for my tests. I posted this to the list years ago and > others announced similar experiences. Noop was a close 2nd to deadline. This reflects the results I get with a battery backed caching RAID controller as well, both Areca and LSI. Noop seemed to scale a little bit better for me than deadline with larger loads, but they were pretty much within a few % of each other either way. CFQ was also much slower for us.
Dan Harris <fbsd@drivefaster.net> wrote: > Just another anecdote, I found that the deadline scheduler > performed the best for me. I don't have the benchmarks anymore > but deadline vs cfq was dramatically faster for my tests. I > posted this to the list years ago and others announced similar > experiences. Noop was a close 2nd to deadline. That was our experience when we benchmarked a few years ago. Some more recent benchmarks seem to have shown improvements in cfq, but we haven't had enough of a problem with our current setup to make it seem worth the effort of running another set of benchmarks on that. -Kevin
On Fri, Mar 4, 2011 at 10:34 AM, Glyn Astill <glynastill@yahoo.co.uk> wrote: > I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers? I'm going to have to dosome real world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back todeadline. It was a few years ago now, but I went through a similar round of testing, and thought CFQ was fine, until I deployed the box. It fell on its face, hard. I can't find a reference offhand, but I remember reading somewhere that CFQ is optimized for more desktop type workloads, and that in its efforts to ensure fair IO access for all processes, it can actively interfere with high-concurrency workloads like you'd expect to see on a DB server -- especially one as big as your specs indicate. Then again, it's been a few years, so the scheduler may have improved significantly in that span. My standard approach since has just been to use no-op. We've shelled out enough money for a RAID controller, if not a SAN, so it seems silly to me not to defer to the hardware, and let it do its job. With big caches, command queueing, and direct knowledge of how the data is laid out on the spindles, I'm hard-pressed to imagine a scenario where the kernel is going to be able to do a better job of IO prioritization than the controller. I'd absolutely recommend testing with pg, so you can get a feel for how it behaves under real-world workloads. The critical thing there is that your testing needs to create workloads that are in the neighborhood of what you'll see in production. In my case, the final round of testing included something like 15-20% of the user-base for the app the db served, and everything seemed fine. Once we opened the flood-gates, and all the users were hitting the new db, though, nothing worked for anyone. Minute-plus page-loads across the board, when people weren't simply timing out. As always, YMMV, the plural of anecdote isn't data, &c. rls -- :wq
On Sat, Mar 5, 2011 at 6:09 AM, Rosser Schwarz <rosser.schwarz@gmail.com> wrote: > On Fri, Mar 4, 2011 at 10:34 AM, Glyn Astill <glynastill@yahoo.co.uk> wrote: >> I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers? I'm going to have to dosome real world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back todeadline. > > It was a few years ago now, but I went through a similar round of > testing, and thought CFQ was fine, until I deployed the box. It fell > on its face, hard. I can't find a reference offhand, but I remember > reading somewhere that CFQ is optimized for more desktop type > workloads, and that in its efforts to ensure fair IO access for all > processes, it can actively interfere with high-concurrency workloads > like you'd expect to see on a DB server -- especially one as big as > your specs indicate. Then again, it's been a few years, so the > scheduler may have improved significantly in that span. > > My standard approach since has just been to use no-op. We've shelled > out enough money for a RAID controller, if not a SAN, so it seems > silly to me not to defer to the hardware, and let it do its job. With > big caches, command queueing, and direct knowledge of how the data is > laid out on the spindles, I'm hard-pressed to imagine a scenario where > the kernel is going to be able to do a better job of IO prioritization > than the controller. > > I'd absolutely recommend testing with pg, so you can get a feel for > how it behaves under real-world workloads. The critical thing there > is that your testing needs to create workloads that are in the > neighborhood of what you'll see in production. In my case, the final > round of testing included something like 15-20% of the user-base for > the app the db served, and everything seemed fine. Once we opened the > flood-gates, and all the users were hitting the new db, though, > nothing worked for anyone. Minute-plus page-loads across the board, > when people weren't simply timing out. > > As always, YMMV, the plural of anecdote isn't data, &c. > > rls > > -- > :wq I have a somewhat similar story. :) We recently upgraded to RHEL 6 (2.6.32 + patches) from RHEL 5.6. Our machines are: 24 core (4x6) X5670 2.93GHz 144Gb of RAM 2 x RAID 1 SAS - WAL (on a 5405Z) 8 x RAID10 SAS - Data (on a 5805Z) We decided to test CFQ again (after using the deadline scheduler) and it looked good in normal file system testing and what not. Once we ramped up production traffic on the machines, PostgreSQL pretty much died under the load and could never get to a steady state. I think this had something to do with the PG backends not having enough I/O bandwidth (due to CFQ) to put data into cache fast enough. This went on for an hour before we decided to switch back to deadline. The system was back to normal working order (with 5-6x the I/O throughput of CFQ) in about 3 minutes, after which I/O wait was down to 0-1%. We run a (typical?) OLTP workload for a web app and see something like 2000 to 5000 req/s against PG. Not sure if this helps in the OP's situation, but I guess it's one of those things you need to test with a production workload to find out. :) Regards, Omar
Hello, > Once we ramped up production traffic on the machines, PostgreSQL > pretty much died under the load and could never get to a steady state. > I think this had something to do with the PG backends not having > enough I/O bandwidth (due to CFQ) to put data into cache fast enough. > This went on for an hour before we decided to switch back to deadline. > The system was back to normal working order (with 5-6x the I/O > throughput of CFQ) in about 3 minutes, after which I/O wait was down > to 0-1%. > > We run a (typical?) OLTP workload for a web app and see something like > 2000 to 5000 req/s against PG. > > Not sure if this helps in the OP's situation, but I guess it's one of > those things you need to test with a production workload to find out. > :) Me too. :) I tried switching schedulers on busy Oracle server and deadline gave +~30% in our case (against CFQ). DB was on HP EVA storage. Not 5-6 fold increase but still "free" +30% is pretty nice. CentOS 5.5. Regards, Mindaugas