Thread: Linux I/O schedulers - CFQ & random seeks

Linux I/O schedulers - CFQ & random seeks

From

Glyn Astill

Date:

04 March 2011, 16:34:49

Hi Guys,

I'm in the process of setting up some new hardware and am just doing some basic disk performance testing with bonnie++
tostart with. 

I'm seeing a massive difference on the random seeks test, with CFQ not performing very well as far as I can see.  The
thingis I didn't see this sort of massive divide when doing tests with our current hardware. 

Current hardware: 2x4core E5420 @2.5Ghz/ 32GB RAM/ Adaptec 5805Z w' 512Mb/ Raid 10/ 8 15k 3.5 Disks
New hardware: 4x8core X7550 @2.0Ghz/ 128GB RAM/ H700 w' 1GB/ Raid 10/ 12 15.2k 2.5 Disks

Admittedly, my testing on our current hardware was on 2.6.26 and on the new hardware it's on 2.6.32 - I think I'm going
tohave to check the current hardware on the older kernel too. 

I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers?  I'm going to have to do
somereal world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back to
deadline.

Any opinions  would be appreciated.

Regardless, here are some sample results from the new hardware:

CFQ:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Way5ax      258376M   666  99 434709  96 225498  35  2840  69 952115  76 556.2   3
Latency             12344us     619ms     522ms     255ms     425ms     529ms
Version  1.96       ------Sequential Create------ --------Random Create--------
Way5ax              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 28808  41 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency              6170us     594us     633us    7619us      20us      36us

1.96,1.96,Way5ax,1,1299173113,258376M,,666,99,434709,96,225498,35,2840,69,952115,76,556.2,3,16,,,,,28808,41,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,12344us,619ms,522ms,255ms,425ms,529ms,6170us,594us,633us,7619us,20us,36us

deadline:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Way5ax      258376M   696  99 449914  96 287010  47  2952  69 989527  78  2304  19
Latency             11939us     856ms     570ms     174ms     228ms   24744us
Version  1.96       ------Sequential Create------ --------Random Create--------
Way5ax              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31338  45 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency              5605us     605us     627us    6590us      19us      38us

1.96,1.96,Way5ax,1,1299237441,258376M,,696,99,449914,96,287010,47,2952,69,989527,78,2304,19,16,,,,,31338,45,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,11939us,856ms,570ms,174ms,228ms,24744us,5605us,605us,627us,6590us,19us,38us

no-op:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
Way5ax      258376M   706  99 451578  95 303351  49  4104  96 1003688  78  2294  19
Latency             11538us     530ms    1460ms   12141us     350ms   22969us
Version  1.96       ------Sequential Create------ --------Random Create--------
Way5ax              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31137  44 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency              5918us     597us     627us    5039us      17us      36us

1.96,1.96,Way5ax,1,1299245225,258376M,,706,99,451578,95,303351,49,4104,96,1003688,78,2294,19,16,,,,,31137,44,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,11538us,530ms,1460ms,12141us,350ms,22969us,5918us,597us,627us,5039us,17us,36us

--
Glyn

Re: Linux I/O schedulers - CFQ & random seeks

From

Wayne Conrad

Date:

04 March 2011, 17:03:49

On 03/04/11 10:34, Glyn Astill wrote:
 > I'm wondering (and this may be a can of worms) what peoples opinions
are on these schedulers?

When testing our new DB box just last month, we saw a big improvement in
bonnie++ random I/O rates when using the noop scheduler instead of cfq
(or any other).  We've got RAID 10/12 on a 3ware card w/ battery-backed
cache; 7200rpm drives.  Our file system is XFS with
noatime,nobarrier,logbufs=8,logbsize=256k.  How much is "big?"  I can't
find my notes for it, but I recall that the difference was large enough
to surprise us.  We're running with noop in production right now.  No
complaints.

Re: Linux I/O schedulers - CFQ & random seeks

From

Dan Harris

Date:

04 March 2011, 17:48:47

On 3/4/11 11:03 AM, Wayne Conrad wrote:
> On 03/04/11 10:34, Glyn Astill wrote:
> > I'm wondering (and this may be a can of worms) what peoples opinions
> are on these schedulers?
>
> When testing our new DB box just last month, we saw a big improvement
> in bonnie++ random I/O rates when using the noop scheduler instead of
> cfq (or any other).  We've got RAID 10/12 on a 3ware card w/
> battery-backed cache; 7200rpm drives.  Our file system is XFS with
> noatime,nobarrier,logbufs=8,logbsize=256k.  How much is "big?"  I
> can't find my notes for it, but I recall that the difference was large
> enough to surprise us.  We're running with noop in production right
> now.  No complaints.
>
Just another anecdote, I found that the deadline scheduler performed the
best for me.  I don't have the benchmarks anymore but deadline vs cfq
was dramatically faster for my tests.  I posted this to the list years
ago and others announced similar experiences.  Noop was a close 2nd to
deadline.

XFS (noatime,nodiratime,nobarrier,logbufs=8)
391GB db cluster directory
BBU Caching RAID10 12-disk SAS
128GB RAM
Constant insert stream
OLAP-ish query patterns
Heavy random I/O

Re: Linux I/O schedulers - CFQ & random seeks

From

Scott Marlowe

Date:

04 March 2011, 18:02:47

On Fri, Mar 4, 2011 at 11:39 AM, Dan Harris <fbsd@drivefaster.net> wrote:
> Just another anecdote, I found that the deadline scheduler performed the
> best for me.  I don't have the benchmarks anymore but deadline vs cfq was
> dramatically faster for my tests.  I posted this to the list years ago and
> others announced similar experiences.  Noop was a close 2nd to deadline.

This reflects the results I get with a battery backed caching RAID
controller as well, both Areca and LSI.  Noop seemed to scale a little
bit better for me than deadline with larger loads, but they were
pretty much within a few % of each other either way.  CFQ was also
much slower for us.

Re: Linux I/O schedulers - CFQ & random seeks

From

"Kevin Grittner"

Date:

04 March 2011, 18:07:13

Dan Harris <fbsd@drivefaster.net> wrote:

> Just another anecdote, I found that the deadline scheduler
> performed the best for me.  I don't have the benchmarks anymore
> but deadline vs cfq was dramatically faster for my tests.  I
> posted this to the list years ago and others announced similar
> experiences.  Noop was a close 2nd to deadline.

That was our experience when we benchmarked a few years ago.  Some
more recent benchmarks seem to have shown improvements in cfq, but
we haven't had enough of a problem with our current setup to make it
seem worth the effort of running another set of benchmarks on that.

-Kevin

Re: Linux I/O schedulers - CFQ & random seeks

From

Rosser Schwarz

Date:

04 March 2011, 18:09:47

On Fri, Mar 4, 2011 at 10:34 AM, Glyn Astill <glynastill@yahoo.co.uk> wrote:
> I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers? I'm going to have to
dosome real world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back
todeadline.

It was a few years ago now, but I went through a similar round of
testing, and thought CFQ was fine, until I deployed the box. It fell
on its face, hard. I can't find a reference offhand, but I remember
reading somewhere that CFQ is optimized for more desktop type
workloads, and that in its efforts to ensure fair IO access for all
processes, it can actively interfere with high-concurrency workloads
like you'd expect to see on a DB server -- especially one as big as
your specs indicate. Then again, it's been a few years, so the
scheduler may have improved significantly in that span.

My standard approach since has just been to use no-op. We've shelled
out enough money for a RAID controller, if not a SAN, so it seems
silly to me not to defer to the hardware, and let it do its job. With
big caches, command queueing, and direct knowledge of how the data is
laid out on the spindles, I'm hard-pressed to imagine a scenario where
the kernel is going to be able to do a better job of IO prioritization
than the controller.

I'd absolutely recommend testing with pg, so you can get a feel for
how it behaves under real-world workloads. The critical thing there
is that your testing needs to create workloads that are in the
neighborhood of what you'll see in production. In my case, the final
round of testing included something like 15-20% of the user-base for
the app the db served, and everything seemed fine. Once we opened the
flood-gates, and all the users were hitting the new db, though,
nothing worked for anyone. Minute-plus page-loads across the board,
when people weren't simply timing out.

As always, YMMV, the plural of anecdote isn't data, &c.

rls

--
:wq

Re: Linux I/O schedulers - CFQ & random seeks

From

Omar Kilani

Date:

05 March 2011, 23:39:10

On Sat, Mar 5, 2011 at 6:09 AM, Rosser Schwarz <rosser.schwarz@gmail.com> wrote:
> On Fri, Mar 4, 2011 at 10:34 AM, Glyn Astill <glynastill@yahoo.co.uk> wrote:
>> I'm wondering (and this may be a can of worms) what peoples opinions are on these schedulers?  I'm going to have to
dosome real world testing myself with postgresql too, but initially was thinking of switching from our current CFQ back
todeadline. 
>
> It was a few years ago now, but I went through a similar round of
> testing, and thought CFQ was fine, until I deployed the box.  It fell
> on its face, hard.  I can't find a reference offhand, but I remember
> reading somewhere that CFQ is optimized for more desktop type
> workloads, and that in its efforts to ensure fair IO access for all
> processes, it can actively interfere with high-concurrency workloads
> like you'd expect to see on a DB server -- especially one as big as
> your specs indicate.  Then again, it's been a few years, so the
> scheduler may have improved significantly in that span.
>
> My standard approach since has just been to use no-op.  We've shelled
> out enough money for a RAID controller, if not a SAN, so it seems
> silly to me not to defer to the hardware, and let it do its job.  With
> big caches, command queueing, and direct knowledge of how the data is
> laid out on the spindles, I'm hard-pressed to imagine a scenario where
> the kernel is going to be able to do a better job of IO prioritization
> than the controller.
>
> I'd absolutely recommend testing with pg, so you can get a feel for
> how it behaves under real-world workloads.  The critical thing there
> is that your testing needs to create workloads that are in the
> neighborhood of what you'll see in production.  In my case, the final
> round of testing included something like 15-20% of the user-base for
> the app the db served, and everything seemed fine.  Once we opened the
> flood-gates, and all the users were hitting the new db, though,
> nothing worked for anyone.  Minute-plus page-loads across the board,
> when people weren't simply timing out.
>
> As always, YMMV, the plural of anecdote isn't data, &c.
>
> rls
>
> --
> :wq

I have a somewhat similar story. :)

We recently upgraded to RHEL 6 (2.6.32 + patches) from RHEL 5.6.

Our machines are:

24 core (4x6) X5670 2.93GHz
144Gb of RAM
2 x RAID 1 SAS - WAL (on a 5405Z)
8 x RAID10 SAS - Data (on a 5805Z)

We decided to test CFQ again (after using the deadline scheduler) and
it looked good in normal file system testing and what not.

Once we ramped up production traffic on the machines, PostgreSQL
pretty much died under the load and could never get to a steady state.
I think this had something to do with the PG backends not having
enough I/O bandwidth (due to CFQ) to put data into cache fast enough.
This went on for an hour before we decided to switch back to deadline.
The system was back to normal working order (with 5-6x the I/O
throughput of CFQ) in about 3 minutes, after which I/O wait was down
to 0-1%.

We run a (typical?) OLTP workload for a web app and see something like
2000 to 5000 req/s against PG.

Not sure if this helps in the OP's situation, but I guess it's one of
those things you need to test with a production workload to find out.
:)

Regards,
Omar

Re: Linux I/O schedulers - CFQ & random seeks

From

Mindaugas Riauba

Date:

08 March 2011, 17:33:58

  Hello,

> Once we ramped up production traffic on the machines, PostgreSQL
> pretty much died under the load and could never get to a steady state.
> I think this had something to do with the PG backends not having
> enough I/O bandwidth (due to CFQ) to put data into cache fast enough.
> This went on for an hour before we decided to switch back to deadline.
> The system was back to normal working order (with 5-6x the I/O
> throughput of CFQ) in about 3 minutes, after which I/O wait was down
> to 0-1%.
>
> We run a (typical?) OLTP workload for a web app and see something like
> 2000 to 5000 req/s against PG.
>
> Not sure if this helps in the OP's situation, but I guess it's one of
> those things you need to test with a production workload to find out.
> :)

  Me too. :) I tried switching schedulers on busy Oracle server and
deadline gave +~30% in our case (against CFQ). DB was on HP EVA
storage. Not 5-6 fold increase but still "free" +30% is pretty nice.
CentOS 5.5.

  Regards,

  Mindaugas