Thread: what server stats to track / monitor ?
Hey folks, I'm new to performance monitoring and tuning of PG/Linux (have a fair bit of experience in Windows, though those skills were last used about 5 years ago) I finally have Munin set up in my production environment, and my goodness it tracks a whole whack of stuff by default! I want to turn off the graphing of unimportant data, to unclutter the graphs and focus on what's important. So, from the perspective of both Linux and PG, is there canonical list of "here are the most important X things to track" ? On the PG side I currently have 1 graph for # connections, another for DB size, and another for TPS. Then there are a few more graphs that are really cluttered up, each with 8 or 9 things on them. On the Linux side, I clearly want to track HD usage, CPU, memory. But not sure what aspects of each. There is also a default Munin graph for IO Stat - not sure what I am looking for there (I know what it does of course, just not sure what to look for in the numbers) I know some of this stuff was mentioned at PG Con so now I start going back through all my notes and the videos. Already been reviewing. If there is not already a wiki page for this I'll write one. I see this is a good general jump off point : http://wiki.postgresql.org/wiki/Performance_Optimization But jumping off from there (and searching on "Performance") does not come up with anything like what I am talking about. Is there some good Linux performance monitoring and tuning reading that you can recommend? thanks, -Alan -- “Mother Nature doesn’t do bailouts.” - Glenn Prickett
On Fri, Jun 12, 2009 at 03:52:19PM -0400, Alan McKay wrote: > I want to turn off the graphing of unimportant data, to unclutter the > graphs and focus on what's important. I'm unfamiliar with Munin, but if you can turn off the graphing (so as to achieve your desired level of un-cluttered-ness) without disabling the capture of the data that was being graphed, you'll be better off. Others' opinions may certainly vary, but in my experience, provided you're not causing a performance problem simply because you're monitoring so much stuff, you're best off capturing every statistic reasonably possible. The time will probably come when you'll find that that statistic, and all the history you've been capturing for it, becomes useful. - Josh / eggyknap
Attachment
> I'm unfamiliar with Munin, but if you can turn off the graphing (so as to > achieve your desired level of un-cluttered-ness) without disabling the capture > of the data that was being graphed, you'll be better off. Others' opinions may > certainly vary, but in my experience, provided you're not causing a > performance problem simply because you're monitoring so much stuff, you're > best off capturing every statistic reasonably possible. The time will probably > come when you'll find that that statistic, and all the history you've been > capturing for it, becomes useful. Yes, Munin does allow me to turn off graphing without turning off collecting. Any pointers for good reading material here? Other tips? -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
Yes, I'm familiar with Staplr - if anyone from myyearbook.com is listening in, I'm still hoping for that 0.7 update :-) I plan to run both for the immediate term at least. But this only concerns collecting - my biggest concern is how to read/interpret the data! Pointers to good reading material would be greatly appreciated. On Fri, Jun 12, 2009 at 4:40 PM, Rauan Maemirov<rauan@maemirov.com> wrote: > Hi Alan. For simple needs you can use Staplr, it's very easy to configure. > There's also one - zabbix, pretty much. -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
Hi Alan. For simple needs you can use Staplr, it's very easy to configure. There's also one - zabbix, pretty much. 2009/6/13 Alan McKay: > Hey folks, > > I'm new to performance monitoring and tuning of PG/Linux (have a fair > bit of experience in Windows, though those skills were last used about > 5 years ago) > > I finally have Munin set up in my production environment, and my > goodness it tracks a whole whack of stuff by default! > > I want to turn off the graphing of unimportant data, to unclutter the > graphs and focus on what's important. > > So, from the perspective of both Linux and PG, is there canonical list > of "here are the most important X things to track" ? > > On the PG side I currently have 1 graph for # connections, another for > DB size, and another for TPS. Then there are a few more graphs that > are really cluttered up, each with 8 or 9 things on them. > > On the Linux side, I clearly want to track HD usage, CPU, memory. But > not sure what aspects of each. There is also a default Munin graph > for IO Stat - not sure what I am looking for there (I know what it > does of course, just not sure what to look for in the numbers) > > I know some of this stuff was mentioned at PG Con so now I start going > back through all my notes and the videos. Already been reviewing. > > If there is not already a wiki page for this I'll write one. I see > this is a good general jump off point : > > http://wiki.postgresql.org/wiki/Performance_Optimization > > But jumping off from there (and searching on "Performance") does not > come up with anything like what I am talking about. > > Is there some good Linux performance monitoring and tuning reading > that you can recommend? > > thanks, > -Alan > > -- > “Mother Nature doesn’t do bailouts.” > - Glenn Prickett > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Fri, Jun 12, 2009 at 04:40:12PM -0400, Alan McKay wrote: > Any pointers for good reading material here? Other tips? The manuals and/or source code for your software? Stories, case studies, and reports from others in similar situations who have gone through problems? Monitoring's job is to avert crises by letting you know things are going south before they die completely. So you probably want to figure out ways in which your setup is most likely to die, and make sure the critical points in that equation are well-monitored, and you understand the monitoring. Provided you stick with it long enough, you'll inevitably encounter a breakdown of some kind or other, which will help you refine your idea of which points are critical. Apart from that, I find it's helpful to read about statistics and formal testing, so you have some idea how confident you can be that the monitors are accurate, that your decisions are justified, etc. But that's not everyone's cup of tea... - Josh / eggyknap
Attachment
On Fri, 12 Jun 2009, Alan McKay wrote: > So, from the perspective of both Linux and PG, is there canonical list > of "here are the most important X things to track" ? Not really, which is why you haven't gotten such a list from anyone here. Exactly what's important to track does vary a bit based on expected workload, and most of the people who have been through this enough to give you a good answer are too busy to write one (you've been in my "I should respond to that" queue for two weeks before I found time to write). > Is there some good Linux performance monitoring and tuning reading > that you can recommend? The only good intro to this I've ever seen, from the perspective of monitoring things would be useful to a database administrator, is the coverage of monitoring in "Performance Tuning for Linux Servers" by Johnson/Huizenga/Pulavarty. Their tuning advice wasn't so useful, but most OS tuning suggestions aren't either. The more useful way to ask the question you'd like an answer to is "when my server starts to perform badly, what does that correlate with?" Find out what you need to investigate to figure that out, and you can determine what you should have been monitoring all along. That is unfortunately workload dependant; the stuff that tends to go wrong in a web app is very different from what happens to a problematic data warehouse for example. The basic important OS level stuff to watch is: -Total memory in use -All the CPU% numbers -Disk read/write MB/s at all levels of granularity you can collect (total across the system, filesystem, array, individual disk). You'll only want to track the total until there's a problem, at which point it's nice to have more data to drilldown into. There's a bunch more disk and memory stats available, I rarely find them of any use. The one Linux specific bit I do like to monitor is the line labeled "Writeback" in /proc/meminfo/ , because that's the best indicator of how much write cache is being done at the OS level. That's a warning sign of many problems in an area Linux often has problems with. On the database side, you want to periodically check the important pg_stat-* views to get an idea how much activity and happening (and where it's happening at), as well as looking for excessive dead tuples and bad index utilization (which manifests by things like too many sequential scans): -pg_stat_user_indexes -pg_stat_user_tables -pg_statio_user_indexes -pg_statio_user_tables If your system is write-intensive at all, you should watch pg_stat_bgwriter too to keep an eye on when that goes badly. At a higher level, it's a good idea to graph the size of the tables and indexes most important to your application over time. It can be handy to track things derived from pg_stat_activity too, like total connections and how old the oldest transaction is. pg_locks can be handy to track stats on too, something like these two counts over time: select (select count(*) from pg_locks where granted) as granted,(select count(*) from pg_locks where not granted) as ungranted; That's the basic set I find myself looking at regularly enough that I wish I always had a historical record of them from the system. Never bothered to work this into a more formal article because a) the workload specific stuff makes it complicated to explain for everyone, b) the wide variation in and variety of monitoring tools out there, and c) wanting to cover the material right which takes a while to do on a topic this big. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Thanks Greg! On Fri, Jun 26, 2009 at 11:27 PM, Greg Smith<gsmith@gregsmith.com> wrote: -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"