Thread: On Logging
Folks, I've run into something that concerns me. It's pretty much an 8.2 issue, but I'm hoping to stimulate some discussion on it. It's PostgreSQL's log files. Right now, they're (sometimes just barely ;) human-readable, but they take significant effort to parse. For example, pqa, a very clever piece of code, is mostly devoted to parsing said files and works only with significant tweaking and restrictions on log file formats in 8.0. Simple logging is a default that should probably not change, but I'm thinking that for people who want to find something out from the logs, we could see about a kind of plugin architecture which would enable things like: * CSV * YAML * XML * Piped logs, as Apache can do * DB handle. I know this one will be controversial. I'm thinking that a GUC variable (or should there be a class of them?) called log_format would be part of the user interface to this and would be able to switch from the cheap default code path to one that's more expensive, just as log_statement does. So, a few questions: 1. Am I the only one who would wants an option for machine-readable logs? 2. Am I way off with the idea for an architecture for same? 3. What big things am I missing here? Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On 9/26/05, David Fetter <david@fetter.org> wrote: > I've run into something that concerns me. It's pretty much an 8.2 > issue, but I'm hoping to stimulate some discussion on it. It's > PostgreSQL's log files. Right now, they're (sometimes just barely ;) > human-readable, but they take significant effort to parse. For > example, pqa, a very clever piece of code, is mostly devoted to > parsing said files and works only with significant tweaking and > restrictions on log file formats in 8.0. In a previous life (oh, like 6 months ago), I spent all my time working on parsing log files from dozens of different software products, and I learned something that made parsing some files orders of magnitude easier than others: Always use message codes. Cisco does this, and it helps a lot. A few other vendors do this, and it helps a lot. While this might seem an old mainframeism, it's terribly useful to have something at the beginning that tells you what the message is, what it means, and most importantly, how to parse the rest. I would be happy to help create this catalog, though it's definately a big step to implement. It would also require identifying every message that could be generated -- something few open source projects do, but it is critical to those of us who have to process the output! > Simple logging is a default that should probably not change, but I'm > thinking that for people who want to find something out from the logs, > we could see about a kind of plugin architecture which would enable > things like: > > * CSV CSV is the best format, ever. Trivially simple to parse, it requires no extra processing so long as you abide by a few extra rules, such as escaping. > * YAML Nice, but I think perhaps not the best format for logging. It's more of a configuration file format in my mind, and it requires a bit more oompf to parse. Not going to happen in AWK. :-) > * Piped logs, as Apache can do Useful, but doesn't create any new capabilities, just simplifies some of them. Focus on "new capabilities" first, then added functionality if required. > * DB handle. I know this one will be controversial. I can't imagine why. :-) > 1. Am I the only one who would wants an option for machine-readable logs? Not likely. I'd love it. It makes monitoring and reporting easier. Chris -- | Christopher Petrilli | petrilli@gmail.com
David Fetter wrote: > >Simple logging is a default that should probably not change, but I'm >thinking that for people who want to find something out from the logs, >we could see about a kind of plugin architecture which would enable >things like: > >* CSV >* YAML >* XML >* Piped logs, as Apache can do >* DB handle. I know this one will be controversial. > > > > This list doesn't seem to be to be all in the same category. The first 3 concern format, the last 2 concern destination (and as such probably don't belong in this discussion) ISTM what we need is a proposal for an abstract structure that will account for all the possible logging messages. i.e. the important issue is not what structuring mechanism is used, but what structure it reflects. For example, we might decide that there are 10 message types and the each has certain fields. (And much as I know you like YAML, I don't think its use is sufficiently widespread to belong here anyway). cheers andrew
Interesting. I am thinking we could put markers like '|' in the log output, and then have some secondary process either remove them or add special formatting to match the requested output format. --------------------------------------------------------------------------- David Fetter wrote: > Folks, > > I've run into something that concerns me. It's pretty much an 8.2 > issue, but I'm hoping to stimulate some discussion on it. It's > PostgreSQL's log files. Right now, they're (sometimes just barely ;) > human-readable, but they take significant effort to parse. For > example, pqa, a very clever piece of code, is mostly devoted to > parsing said files and works only with significant tweaking and > restrictions on log file formats in 8.0. > > Simple logging is a default that should probably not change, but I'm > thinking that for people who want to find something out from the logs, > we could see about a kind of plugin architecture which would enable > things like: > > * CSV > * YAML > * XML > * Piped logs, as Apache can do > * DB handle. I know this one will be controversial. > > I'm thinking that a GUC variable (or should there be a class of them?) > called log_format would be part of the user interface to this and > would be able to switch from the cheap default code path to one that's > more expensive, just as log_statement does. > > So, a few questions: > > 1. Am I the only one who would wants an option for machine-readable logs? > 2. Am I way off with the idea for an architecture for same? > 3. What big things am I missing here? > > Cheers, > D > -- > David Fetter david@fetter.org http://fetter.org/ > phone: +1 510 893 6100 mobile: +1 415 235 3778 > > Remember to vote! > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
David Fetter wrote: > Folks, > > I've run into something that concerns me. It's pretty much an 8.2 > issue, but I'm hoping to stimulate some discussion on it. It's > PostgreSQL's log files. Right now, they're (sometimes just barely ;) > human-readable, but they take significant effort to parse. For > example, pqa, a very clever piece of code, is mostly devoted to > parsing said files and works only with significant tweaking and > restrictions on log file formats in 8.0. > > Simple logging is a default that should probably not change, but I'm > thinking that for people who want to find something out from the logs, > we could see about a kind of plugin architecture which would enable > things like: There are two other restrictions about the log files: - There's no means of restricting logging on some patterns (e.g. specific backends only, certain clients, certain events except for log_duration) - query is truncated due to UDP restrictions. I'd call this not necessarily a logging issue, but a profiling issue. I regularly use MSSQL's profiler to tap an application's query traffic, to find out what's going on, and I'd like the same feature on pgsql. This issue comes up on -hackers regularly, e.g. named logging to tables/logging as inserts, and several others (I can cite them if necessary). What I'd like is an extended logging/profiling facility that can be en/disabled with finer granularity (performance/data volume issues), going to an intermediate file/whatever and regularly converted to table data for easier evaluation (which would fix the format question in the most pgsql like way). Regards, Andreas
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>- query is truncated due to UDP restrictions. > > > Are you confusing the logs with pg_stat_activity? Not confused. I'm talking about the case where statement logging is enabled, I could have mentioned that... Regards, Andreas
David Fetter wrote: > ...log file formats in 8.0.... > > * CSV > * YAML > * XML > * Piped logs, as Apache can do > * DB handle. I know this one will be controversial. > [...] > 1. Am I the only one who would wants an option for machine-readable logs? I'd very much like a format that can be easily loaded into a database (not necessarily the same one producing the logs :-) ) in real time and/or be visible as a table through something like dbi-link. I suppose any of the first three formats you suggest could work with dbi-link; or another alternate format * sql insert statements would work if piped logs were supported by sending it to psql.
Andreas Pflug <pgadmin@pse-consulting.de> writes: > - query is truncated due to UDP restrictions. Are you confusing the logs with pg_stat_activity? regards, tom lane
On Mon, Sep 26, 2005 at 01:13:08PM -0400, Christopher Petrilli wrote: > On 9/26/05, David Fetter <david@fetter.org> wrote: > > I've run into something that concerns me. It's pretty much an 8.2 > > issue, but I'm hoping to stimulate some discussion on it. It's > > PostgreSQL's log files. Right now, they're (sometimes just barely > > ;) human-readable, but they take significant effort to parse. For > > example, pqa, a very clever piece of code, is mostly devoted to > > parsing said files and works only with significant tweaking and > > restrictions on log file formats in 8.0. > > In a previous life (oh, like 6 months ago), I spent all my time > working on parsing log files from dozens of different software > products, and I learned something that made parsing some files > orders of magnitude easier than others: > > Always use message codes. Could you elucidate a bit on this as to how this might affect PostgreSQL logging? > Cisco does this, and it helps a lot. A few other vendors do this, > and it helps a lot. While this might seem an old mainframeism, ^^^^^^^^^^^^^^^^^ You say that like it's a *bad* thing. I think some fruitful communication is possible and has been missed over the decades between mainframe people and *n*x people. The same applies to supercomputing people and *n*x people, but that's a story for another day. > it's terribly useful to have something at the beginning that tells > you what the message is, what it means, and most importantly, how to > parse the rest. OK > I would be happy to help create this catalog, though it's definately > a big step to implement. It would also require identifying every > message that could be generated -- something few open source > projects do, but it is critical to those of us who have to process > the output! Right. How big a project is this, and what kind of framework would we need in order assure that new messages come with new message codes? > > Simple logging is a default that should probably not change, but > > I'm thinking that for people who want to find something out from > > the logs, we could see about a kind of plugin architecture which > > would enable things like: > > > > * CSV > > CSV is the best format, ever. Trivially simple to parse, it > requires no extra processing so long as you abide by a few extra > rules, such as escaping. I agree that it's nice, but seeing as how many smart people have stubbed their toes on the various incarnations of "CSV," I must disagree as to its simplicity. > > * YAML > > Nice, but I think perhaps not the best format for logging. It's > more of a configuration file format in my mind, and it requires a > bit more oompf to parse. Not going to happen in AWK. :-) It's not bad for logging, partly because it's a lot fewer bytes than XML or SGML, but it maintains a structure. Of course, it's not as "simple" in some sense as CSV. > > * Piped logs, as Apache can do > > Useful, but doesn't create any new capabilities, just simplifies > some of them. Focus on "new capabilities" first, then added > functionality if required. Fair enough :) > > * DB handle. I know this one will be controversial. > > I can't imagine why. :-) Heh > > 1. Am I the only one who would wants an option for machine-readable > > logs? > > Not likely. I'd love it. It makes monitoring and reporting easier. That's where I've run across this :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote: > David Fetter wrote: > >...log file formats in 8.0.... > > > >* CSV > >* YAML > >* XML > >* Piped logs, as Apache can do > >* DB handle. I know this one will be controversial. > >[...] > >1. Am I the only one who would wants an option for machine-readable logs? > > I'd very much like a format that can be easily loaded into > a database (not necessarily the same one producing the logs :-) ) > in real time and/or be visible as a table through something > like dbi-link. > > I suppose any of the first three formats you suggest could work > with dbi-link; or another alternate format > * sql insert statements > would work if piped logs were supported by sending it to psql. Apache seems to have the best, most flexible logging of anything out there, and should probably be used as a model. It's pretty easy to have it actually log to a database. Whatever method we decide on, I think it would be very useful if we supported multiple logging streams. I certainly wouldn't want to give up a human-readable log to get a CSV one. Is a logging mechanism the best way to do profiling? Seems like it might be better to have a more efficient, dedicated method. But I'm not against adding capabilities like per-backend logging, etc. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Sep 30, 2005 at 05:54:49PM -0500, Jim C. Nasby wrote: > On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote: > > David Fetter wrote: > > >...log file formats in 8.0.... > > > > > >* CSV > > >* YAML > > >* XML > > >* Piped logs, as Apache can do > > >* DB handle. I know this one will be controversial. > > >[...] > > >1. Am I the only one who would wants an option for > > >machine-readable logs? > > > > I'd very much like a format that can be easily loaded into a > > database (not necessarily the same one producing the logs :-) ) in > > real time and/or be visible as a table through something like > > dbi-link. > > > > I suppose any of the first three formats you suggest could work > > with dbi-link; or another alternate format > > * sql insert statements > > would work if piped logs were supported by sending it to psql. > > Apache seems to have the best, most flexible logging of anything out > there, and should probably be used as a model. It's pretty easy to > have it actually log to a database. Great :) > Whatever method we decide on, I think it would be very useful if we > supported multiple logging streams. I certainly wouldn't want to > give up a human-readable log to get a CSV one. Excellent idea. > Is a logging mechanism the best way to do profiling? Seems like it > might be better to have a more efficient, dedicated method. I'm not totally confident in my ability to think up everything I'd want to look at, every way I'd want to look at it in advance. I'm pretty sure I can't come up with every way everyone could want to do profiling, though. > But I'm not against adding capabilities like per-backend logging, > etc. How would per-backend logging work? Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Fri, Sep 30, 2005 at 06:24:17PM -0700, David Fetter wrote: > How would per-backend logging work? I'd suggest having settings for a per-backend 'debug' logging mode that could be triggered either via a SQL command or a signal to the backend. It would be useful to be able to log this to a seperate area, based either on PID or some identifier passed to the sql command. I think this would cover two use cases: long-running process that's on one back-end that you want info on (send signal to that backend) something using a connection pool. You'd have some one to tell the application to enable logging based on some set of conditions. When those conditions were met, logging would be turned on. When a connection is first grabbed from the connection pool, logging would be forced to off in case it had been turned on by a previous process (which might have just disconnected suddenly). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461