Thread: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
From
"Jim C. Nasby"
Date:
Moving to -hackers. On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote: > >If you know when the checkpoint ended, and you know how long each of the > >pieces took, you can reconstruct the other times easily. The way you > >describe this it is true--that the summary is redundant given the > >detail--but if you put yourself in the shoes of a log file parser the > >other way around is easier to work with. Piecing together log entries > >is a pain, splitting them is easy. > > > >If I had to only keep one line out of this, it would be the one with the > >summary. It would be nice to have it logged at INFO. > > Yeah, if we have the summary line we don't need the other lines and vice > versa. I have sympathy for parsing log files, I've done that a lot in > the past and I can see what you mean. Having the individual lines is > nice when you're monitoring a running system; you don't get the summary > line until the checkpoint is finished. I suppose we can have both the > individual lines and the summary, the extra lines shouldn't hurt anyone, > and you won't get them unless you turn on the new log_checkpoints > parameter anyway. Not to beat a dead horse, but do we really want to force folks to be parsing logs for performance monitoring? Especially if that log parsing is just going to result in data being inserted into a table anyway? I know there's concern about performance of the stats system and maybe that needs to be addressed, but pushing users to log parsing is a lot of extra effort, non-standard, likely to be overlooked, and doesn't play well with other tools. It also conflicts with all the existing statistics framework. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> Not to beat a dead horse, but do we really want to force folks to be > parsing logs for performance monitoring? Especially if that log parsing > is just going to result in data being inserted into a table anyway? > > I know there's concern about performance of the stats system and maybe > that needs to be addressed, but pushing users to log parsing is a lot of > extra effort, non-standard, likely to be overlooked, and doesn't play > well with other tools. It also conflicts with all the existing > statistics framework. One thing that doesn't seemed to be being looked at it is the cost of logging. Logging is very expensive. I don't know if it is more expensive than the stats system, but you can cut your tps in half by having any level of verbose logging on. Yes that can be offset by pushing the logging to another spindle, and being careful about what you are logging but still. Either way, we are taking the hit, it is just a matter of where. IMO it would be better to have the information in the database where it makes sense, than pushing out to a log that: A. Will likely be forgotten B. Is only accessible if you have shell access to the machine (not as common as all of us would like to think) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Re: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
From
Greg Smith
Date:
On Sat, 12 May 2007, Jim C. Nasby wrote: > Not to beat a dead horse, but do we really want to force folks to be > parsing logs for performance monitoring? All of the really interesting DBA level information about checkpoints is now sitting in pg_stat_bgwriter. There really is no reason I'd expect this information to be part of normal performance monitoring setup; for me it has fit into troubleshooting and R&D. When I'm talking about parsing logs in this context, it's for answering questions like "how does fsync time correlate with amount of data written at checkpoint time?" Since I've found it educational to sort through that, I just didn't want the patch modified so that was harder to do than it had to be. If I thought it worth the trouble to expose that level of information via the stats interface, I'd have submitted that instead of this log-based patch. This circles back to the previous discussion of whether this particular information is strictly developer-level. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Sat, 12 May 2007, Joshua D. Drake wrote: > One thing that doesn't seemed to be being looked at it is the cost of > logging. If any of this executed at something like the query level, sure, that would be real important. The majority of the logging I suggested here is of things that happen at checkpoint time. The presumption is that the overhead of that is considerably greater than writing a log line or even five. The least intensive event I would like to be loggable is when a new WAL segment is created and cleared, which is again a pretty small bit of log compared to the 16MB write. I wouldn't mind seeing that exposed under pg_stats instead, just had more interesting things to statify first. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Sat, 2007-12-05 at 14:26 -0700, Joshua D. Drake wrote: > Either way, we are taking the hit, it is just a matter of where. IMO it > would be better to have the information in the database where it makes > sense, than pushing out to a log If performance monitoring information is provided as a database object, what would the right interface be? IMHO the problem with cleanly presenting monitoring information within a normal database system is that this sort of data is fundamentally dynamic and continuous: to determine how the performance of the system changes over time, you need to repeatedly rescan the table/view/SRF and recompute your analysis essentially from scratch. Trying to get even simple information like "queries per second" from pg_stat_activity is an example of how this can be painful. <plug> BTW, if the system included the concept of a continuous data *stream* as a kind of database object, this problem would be much more tractable :) In fact, there is some code in a version of TelegraphCQ that exposes various information about the runtime state of the system as a set of system-defined data streams -- like any other stream, users could then use those streams in arbitrary queries. </plug> -Neil
Jim C. Nasby wrote: > Moving to -hackers. > > On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote: >>> If you know when the checkpoint ended, and you know how long each of the >>> pieces took, you can reconstruct the other times easily. The way you >>> describe this it is true--that the summary is redundant given the >>> detail--but if you put yourself in the shoes of a log file parser the >>> other way around is easier to work with. Piecing together log entries >>> is a pain, splitting them is easy. >>> >>> If I had to only keep one line out of this, it would be the one with the >>> summary. It would be nice to have it logged at INFO. >> Yeah, if we have the summary line we don't need the other lines and vice >> versa. I have sympathy for parsing log files, I've done that a lot in >> the past and I can see what you mean. Having the individual lines is >> nice when you're monitoring a running system; you don't get the summary >> line until the checkpoint is finished. I suppose we can have both the >> individual lines and the summary, the extra lines shouldn't hurt anyone, >> and you won't get them unless you turn on the new log_checkpoints >> parameter anyway. > > Not to beat a dead horse, but do we really want to force folks to be > parsing logs for performance monitoring? Especially if that log parsing > is just going to result in data being inserted into a table anyway? > > I know there's concern about performance of the stats system and maybe > that needs to be addressed, but pushing users to log parsing is a lot of > extra effort, non-standard, likely to be overlooked, and doesn't play > well with other tools. It also conflicts with all the existing > statistics framework. There is two counters for checkpoints in pgstats, the number of timed (triggered by checkpoint_timeout) and requested (triggered by checkpoint_segments) checkpoints. Maybe we should improve the stats system so that we can collect events with timestamps and durations, but in my experience log files actually are the most reliable and universal way to collect real-time performance information. Any serious tool has a generic log parser. The other alternative is SNMP. I welcome the efforts on pgsnmpd.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: >>> Yeah, if we have the summary line we don't need the other lines and >>> vice versa. I have sympathy for parsing log files, I've done that a >>> lot in the past and I can see what you mean. Having the individual >>> lines is nice when you're monitoring a running system; you don't get >>> the summary line until the checkpoint is finished. I suppose we can >>> have both the individual lines and the summary, the extra lines >>> shouldn't hurt anyone, and you won't get them unless you turn on the >>> new log_checkpoints parameter anyway. >> >> Not to beat a dead horse, but do we really want to force folks to be >> parsing logs for performance monitoring? Especially if that log parsing >> is just going to result in data being inserted into a table anyway? >> >> I know there's concern about performance of the stats system and maybe >> that needs to be addressed, but pushing users to log parsing is a lot of >> extra effort, non-standard, likely to be overlooked, and doesn't play >> well with other tools. It also conflicts with all the existing >> statistics framework. > > There is two counters for checkpoints in pgstats, the number of timed > (triggered by checkpoint_timeout) and requested (triggered by > checkpoint_segments) checkpoints. > > Maybe we should improve the stats system so that we can collect events > with timestamps and durations, but in my experience log files actually > are the most reliable and universal way to collect real-time performance > information. Any serious tool has a generic log parser. The other > alternative is SNMP. I welcome the efforts on pgsnmpd.. pgsnmpd can't provide any information that's not in the backend. Unless we'd turn it into a log parser, which is not really something I think is a good idea. Log files are great for one kind of thing, "live statistics" through SNMP or the statistics collector for another kind. It only gets wrong when you put them in the wrong place. Things you poll regularly makes a lot more sense in some kind of live view than in a log file. //Magnus
On Sun, May 13, 2007 at 07:54:20AM +0100, Heikki Linnakangas wrote: > Maybe we should improve the stats system so that we can collect events > with timestamps and durations, but in my experience log files actually > are the most reliable and universal way to collect real-time performance > information. Any serious tool has a generic log parser. The other > alternative is SNMP. I welcome the efforts on pgsnmpd.. Having timing information in the stats system would be useful, but I'm not sure how it could actually be done. But at least if the information is in the stats system it's easy to programatically collect and process. SNMP is just one example of that (fwiw I agree with Magnus that it probably doesn't make sense to turn pgsnmpd into a log parser...) -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Heikki Linnakangas napsal(a): > Jim C. Nasby wrote: > > There is two counters for checkpoints in pgstats, the number of timed > (triggered by checkpoint_timeout) and requested (triggered by > checkpoint_segments) checkpoints. > > Maybe we should improve the stats system so that we can collect events > with timestamps and durations, but in my experience log files actually > are the most reliable and universal way to collect real-time performance > information. Any serious tool has a generic log parser. The other > alternative is SNMP. I welcome the efforts on pgsnmpd.. Whats about add some DTrace probes? Zdenek