Thread: processor running queue - general rule of thumb?
Hey folks, I'm new to all this stuff, and am sitting here with kSar looking at some graphed results of some load tests we did, trying to figure things out :-) We got some unsatisfactory results in stressing our system, and now I have to divine where the bottleneck is. We did 4 tests, upping the load each time. The 3rd and 4th ones have all 8 cores pegged at about 95%. Yikes! In the first test the processor running queue spikes at 7 and maybe averages 4 or 5 In the last test it spikes at 33 with an average maybe 25. Looks to me like it could be a CPU bottleneck. But I'm new at this :-) Is there a general rule of thumb "if queue is longer than X, it is likely a bottleneck?" In reading an IBM Redbook on Linux performance, I also see this : "High numbers of context switches in connection with a large number of interrupts can signal driver or application issues." On my first test where the CPU is not pegged, context switching goes from about 3700 to about 4900, maybe averaging 4100 On the pegged test, the values are maybe 10% higher than that, maybe 15%. It is an IBM 3550 with 8 cores, 2660.134 MHz (from dmesg), 32Gigs RAM thanks, -Alan -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
Alan McKay wrote: > Hey folks, > We did 4 tests, upping the load each time. The 3rd and 4th ones have > all 8 cores pegged at about 95%. Yikes! > > In the first test the processor running queue spikes at 7 and maybe > averages 4 or 5 > > In the last test it spikes at 33 with an average maybe 25. > > Looks to me like it could be a CPU bottleneck. But I'm new at this :-) > > Is there a general rule of thumb "if queue is longer than X, it is > likely a bottleneck?" > > In reading an IBM Redbook on Linux performance, I also see this : > "High numbers of context switches in connection with a large number of > interrupts can signal driver or application issues." > > On my first test where the CPU is not pegged, context switching goes > from about 3700 to about 4900, maybe averaging 4100 > > > Well the people here will need allot more information to figure out what is going on. What kind of Stress did you do???? is it a specific query causing the problem in the test What kind of load? How many simulated clients How big is the database? Need to see the postgresql.config What kind of IO Subsystem do you have ??? what does vmstat show have you look at wiki yet http://wiki.postgresql.org/wiki/Performance_Optimization
On Fri, Jun 19, 2009 at 9:59 AM, Alan McKay<alan.mckay@gmail.com> wrote: > Hey folks, > > I'm new to all this stuff, and am sitting here with kSar looking at > some graphed results of some load tests we did, trying to figure > things out :-) > > We got some unsatisfactory results in stressing our system, and now I > have to divine where the bottleneck is. > > We did 4 tests, upping the load each time. The 3rd and 4th ones have > all 8 cores pegged at about 95%. Yikes! > > In the first test the processor running queue spikes at 7 and maybe > averages 4 or 5 > > In the last test it spikes at 33 with an average maybe 25. > > Looks to me like it could be a CPU bottleneck. But I'm new at this :-) > > Is there a general rule of thumb "if queue is longer than X, it is > likely a bottleneck?" > > In reading an IBM Redbook on Linux performance, I also see this : > "High numbers of context switches in connection with a large number of > interrupts can signal driver or application issues." > > On my first test where the CPU is not pegged, context switching goes > from about 3700 to about 4900, maybe averaging 4100 That's not too bad. If you see them in the 30k to 150k range, then worry about it. > On the pegged test, the values are maybe 10% higher than that, maybe 15%. That's especially good news. Normally when you've got a problem, it will increase in a geometric (or worse) way. > It is an IBM 3550 with 8 cores, 2660.134 MHz (from dmesg), 32Gigs RAM Like the other poster said, we likely don't have enough to tell you what's going on, but from what you've said here it sounds like you're mostly just CPU bound. Assuming you're reading the output of vmstat and top and other tools like that.
> Like the other poster said, we likely don't have enough to tell you > what's going on, but from what you've said here it sounds like you're > mostly just CPU bound. Assuming you're reading the output of vmstat > and top and other tools like that. Thanks. I used 'sadc' from the sysstat RPM (part of the sar suite) to collect data, and it does collect Vm and other data like that from top and vmstat. I did not see any irregular activity in those areas. I realise I did not give you all enough details, which is why I worded my question they way I did : "is there a general rule of thumb for running queue" -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
BTW, our designer got the nytprofile or whatever it is called for Perl and found out that it was a problem with the POE library that was being used as a state-machine to drive the whole load suite. It was taking something like 95% of the CPU time! On Fri, Jun 19, 2009 at 11:59 AM, Alan McKay<alan.mckay@gmail.com> wrote: > Hey folks, > > I'm new to all this stuff, and am sitting here with kSar looking at > some graphed results of some load tests we did, trying to figure > things out :-) > > We got some unsatisfactory results in stressing our system, and now I > have to divine where the bottleneck is. > > We did 4 tests, upping the load each time. The 3rd and 4th ones have > all 8 cores pegged at about 95%. Yikes! > > In the first test the processor running queue spikes at 7 and maybe > averages 4 or 5 > > In the last test it spikes at 33 with an average maybe 25. > > Looks to me like it could be a CPU bottleneck. But I'm new at this :-) > > Is there a general rule of thumb "if queue is longer than X, it is > likely a bottleneck?" > > In reading an IBM Redbook on Linux performance, I also see this : > "High numbers of context switches in connection with a large number of > interrupts can signal driver or application issues." > > On my first test where the CPU is not pegged, context switching goes > from about 3700 to about 4900, maybe averaging 4100 > > On the pegged test, the values are maybe 10% higher than that, maybe 15%. > > It is an IBM 3550 with 8 cores, 2660.134 MHz (from dmesg), 32Gigs RAM > > thanks, > -Alan > > -- > “Don't eat anything you've ever seen advertised on TV” > - Michael Pollan, author of "In Defense of Food" > -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"