Thread: WIP patch - INSERT-able log statements
Hi, I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet completelyfunctional) "Allow server log information to be output as INSERT statements This would allow server log information to be easily loaded into a database for analysis. " I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts. What is done so far --------------- Two postgresql.conf variables #log_output_type = 'text' #Valid values are 'SQL' or 'text' #log_output_table_name = 'auditlogs' These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output asINSERT commands. The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdbto create one. The proposed log output structure ------------------ INSERT INTO user_defined_table values( timestamp_with_milliseconds, timestamp, username, databasename, sessionid, host_and_port,host, proc_id, command_tag, session_start, transaction_id, error_severity, SQL_State_Code, error_message); All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these columnswill have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in appropriatecolumns. The code has been modified to do this. Issues/Questions are: - How about 'Statement duration log'. This will come to the logger as a single string and after the query execution. Inthe existing log we can make sense of the duration log by matching it with the statement above it or by the statement whichgets printed besides it (Again as a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row. (Mypreferred option is to add another column to the table structure defined above as 'duration'. But haven't figured outhow to achieve this, because the statement is printed first and then the duration as another log.) - If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current syslogwriter function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other informationbefore outputting it, which cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log outputis best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose andvarious other statistics log. - There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE ........'.For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both textand SQL outputs ? Rgds, Arul Shaji This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
Attachment
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdbto create one. Why not output the data in COPY format instead? That (a) eliminates the problem of needing to predetermine a destination table name, and (b) should be considerably faster to load than thousands of INSERT statements. regards, tom lane
On Fri, 16 Feb 2007 11:50, Tom Lane wrote: > "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > > The second variable is of interest. We need to specify a table in the > > insert command. My preferred option is for the user to give one and he > > can create it if and when he wants to. The alternative is we decide the > > table name and make initdb to create one. > > Why not output the data in COPY format instead? That (a) eliminates the > problem of needing to predetermine a destination table name, and (b) > should be considerably faster to load than thousands of INSERT statements. Yeah, that was my initial idea too... But because the TODO item clearly mentions INSERT, I thought maybe there was some very specific reason for the output to be in INSERT stmts.. .. Rgds, Arul Shaji > > regards, tom lane This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > On Fri, 16 Feb 2007 11:50, Tom Lane wrote: >> Why not output the data in COPY format instead? > Yeah, that was my initial idea too... But because the TODO item clearly > mentions INSERT, I thought maybe there was some very specific reason for the > output to be in INSERT stmts.. .. I don't think the TODO entry was particularly carefully thought out. regards, tom lane
Tom Lane wrote: > "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > > On Fri, 16 Feb 2007 11:50, Tom Lane wrote: > >> Why not output the data in COPY format instead? > > > Yeah, that was my initial idea too... But because the TODO item clearly > > mentions INSERT, I thought maybe there was some very specific reason for the > > output to be in INSERT stmts.. .. > > I don't think the TODO entry was particularly carefully thought out. ... which is why discussing stuff in -hackers *before* coding tends to be a good idea. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
FAST PostgreSQL wrote: > On Fri, 16 Feb 2007 11:50, Tom Lane wrote: >> "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: >>> The second variable is of interest. We need to specify a table in the >>> insert command. My preferred option is for the user to give one and he >>> can create it if and when he wants to. The alternative is we decide the >>> table name and make initdb to create one. >> Why not output the data in COPY format instead? That (a) eliminates the >> problem of needing to predetermine a destination table name, and (b) >> should be considerably faster to load than thousands of INSERT statements. > > Yeah, that was my initial idea too... But because the TODO item clearly > mentions INSERT, I thought maybe there was some very specific reason for the > output to be in INSERT stmts.. .. COPY would be a good option, but INSERT is probably what I would use as the default. The most use I see for this is something where I am tailing out the log and inserting live into a log db... Sincerely, Joshua D. Drake > > Rgds, > Arul Shaji > > > >> regards, tom lane > This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. > > If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Sat, 17 Feb 2007, FAST PostgreSQL wrote: > #log_output_type = 'text' #Valid values are 'SQL' or 'text' > Defaults to 'text' which is status quo. If it is set to 'SQL' log will > be output as INSERT commands. This assumes someone wants either the INSERT-able logs or the current, easily readable ones. I know I don't want either--I want both. There are times I want to look through the logs with a text editor, there are times where I want to query against them. I would suggest treating this similarly to how the Windows eventlog is handled: made SQL INSERT format another option available to log_destination, so it can be combined with the existing formats. In addition to the syslog concerns you already mentioned (which are themselves a showstopper for using this feature in companies that rely on or aggregate syslogs), I know I'd want to keep the existing logs rolling in parallel while I tested out the SQL-based version for a while, before cutting over to exclusively INSERT format logs. I've thought a bit about how to implement this TODO already (I have a log file parser and I hate maintaining it), and the only thing that made sense to me was giving a new parameter with the filename to output to in this format. For example, make a new log_sql_filename with the same syntax already used for log_filename. There will probably need to be a second parameter for the table name to insert into as you've already commented on. And like Joshua has already suggested, the main useful applications for this feature I've thought of all involve reading from the INSERT-able logs in real-time, using something like "tail -f", and pumping that data immediately into a logger table. Also, I feel that supporting the whole log_line_prefix syntax for this feature is not just overkill, it's a bad idea. Output everything in a standard, complete format instead, and then it becomes easy for the community at large to build tools on top of that to analyze the log database entries instead of having so many ad-hoc approaches. You want a subset, use a view or copy just the fields you want into another table. I would guess this simplifies the patch as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Added -hackers to CC:. On 2/18/07, Greg Smith <gsmith@gregsmith.com> wrote: > I've thought a bit about how to implement this TODO already (I have a log > file parser and I hate maintaining it) Any problem using pgFouine? > Also, I feel that supporting the whole log_line_prefix syntax for this > feature is not just overkill, it's a bad idea. Output everything in a > standard, complete format instead, and then it becomes easy for the > community at large to build tools on top of that to analyze the log > database entries instead of having so many ad-hoc approaches. You want a > subset, use a view or copy just the fields you want into another table. > I would guess this simplifies the patch as well. I agree with you on this point. We need every information we can put in the database (database, user and so on) in a structured way. Logging statements is not the only thing to do IMHO. Logging errors in another table is important too. I'm pretty sure there are a lot people who don't know if there are errors in their statements. Anyway, if something is developed to do that, I'll extend pgFouine to support it (or I'll write another application to do it) and I'm pretty sure others will do. The most annoying and difficult part of the work on pgFouine/PQA/... is to maintain the parser. If we don't have to do that anymore, we'll have more time to think about new features. In fact, it was a long plan for pgFouine to be able to store the results of its analysis in a database so it's a good news if it's implemented in core. -- Guillaume
On Sun, 18 Feb 2007 18:25, Greg Smith wrote: > On Sat, 17 Feb 2007, FAST PostgreSQL wrote: > > #log_output_type = 'text' #Valid values are 'SQL' or 'text' > > Defaults to 'text' which is status quo. If it is set to 'SQL' log will > > be output as INSERT commands. > > This assumes someone wants either the INSERT-able logs or the current, > easily readable ones. I know I don't want either--I want both. There are > times I want to look through the logs with a text editor, there are times > where I want to query against them. > > I would suggest treating this similarly to how the Windows eventlog is > handled: made SQL INSERT format another option available to > log_destination, so it can be combined with the existing formats. In I think adding the 'format' of the log as an option in the 'destination' may be confusing. We can have a new boolean variable like 'output_sql_log' or 'log_sql_format' which will trigger the output of INSERT-able log in addition to syslog/stderr/eventlog in text format as it is now. > addition to the syslog concerns you already mentioned (which are > themselves a showstopper for using this feature in companies that rely on > or aggregate syslogs), I know I'd want to keep the existing logs rolling > in parallel while I tested out the SQL-based version for a while, before > cutting over to exclusively INSERT format logs. > > I've thought a bit about how to implement this TODO already (I have a log Any thoughts on how to output the duration in the same INSERT as that of the query without having to output two entries per statement as it is done in text now? ? > file parser and I hate maintaining it), and the only thing that made sense > to me was giving a new parameter with the filename to output to in this > format. For example, make a new log_sql_filename with the same syntax > already used for log_filename. There will probably need to be a second > parameter for the table name to insert into as you've already commented > on. And like Joshua has already suggested, the main useful applications This would mean that the user will have to maintain two sets of variable. One for text output (if redirect stderr is on) and another for sql output. My preference would be for the sql-logs to use the same variables as that of redirect_stderr. In the directory mentioned by the user on log_directory we just output the sql logs in a file with an '.SQL' appended to the filename specified by the user in log_filename. This also means we can make use of the log_truncation and log_rotate options (I hope. I will have a look at that). Of course we document this making user aware of this. (The other option is we output the logs in a subdirectory called 'SQL' which we can silently create, if we are allowed to ! ! !) This also means only two new variable is created in postgresql.conf for this feature. To trigger the output and the table name. (Or we can introduce only the table name variable which will automatically trigger the SQL log output. Maybe confusing ? ? ) Rgds, Arul Shaji > for this feature I've thought of all involve reading from the INSERT-able > logs in real-time, using something like "tail -f", and pumping that data > immediately into a logger table. > > Also, I feel that supporting the whole log_line_prefix syntax for this > feature is not just overkill, it's a bad idea. Output everything in a > standard, complete format instead, and then it becomes easy for the > community at large to build tools on top of that to analyze the log > database entries instead of having so many ad-hoc approaches. You want a > subset, use a view or copy just the fields you want into another table. > I would guess this simplifies the patch as well. This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
On Tue, 20 Feb 2007, FAST PostgreSQL wrote: > I think adding the 'format' of the log as an option in the 'destination' > may be confusing. We can have a new boolean variable like > 'output_sql_log' or 'log_sql_format' which will trigger the output of > INSERT-able log in addition to syslog/stderr/eventlog in text format as > it is now. What's confusing about it? Consider this hypothetical postgresql.conf snippet: log_destination = 'stderr,sql' # Valid values are combinations of # stderr, syslog, sql, and eventlog, # depending on platform. # These are relevant when logging to sql: log_sql_table = 'pg_log' # Table SQL formatted logs INSERT into # Default is 'pg_log' Adding new GUC variables isn't without cost; no reason to add a new one when there's a perfectly good destination one already whose syntax is completely appropriate for this task. I personally would just ignore the duration two entries per statement problem and make that the log analyzer software's job to fix, but I'm likely biased here because I don't ever do anything with that data. > My preference would be for the sql-logs to use the same variables as > that of redirect_stderr. In the directory mentioned by the user on > log_directory we just output the sql logs in a file with an '.SQL' > appended to the filename specified by the user in log_filename. This > also means we can make use of the log_truncation and log_rotate options > (I hope. I will have a look at that). You're talking along the same lines here I was trying to suggest in my previous message. Keep all the current behavior as far as truncation and rotation go, just provide a different name for the file. If you just appended a suffix like .sql to the existing name, that would remove another GUC variable requirement. I think eventually people will complain about this, and want a separately formatted filename altogether, but there's nothing wrong with the approach you suggest for a first version of this feature. I know I'm so desparate for a better solution to log importing that I'd be happy with any workable improvement. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 2/19/07, Greg Smith <gsmith@gregsmith.com> wrote: > log_destination = 'stderr,sql' # Valid values are combinations of > # stderr, syslog, sql, and eventlog, > # depending on platform. +1 > # These are relevant when logging to sql: > log_sql_table = 'pg_log' # Table SQL formatted logs INSERT into > # Default is 'pg_log' Is there really a need for that? Why not simply put something like %log_table% in the sql file and let the admin replace it with sed or whatever he likes? And it could allow us to move to copy without having to drop the code added to manage the new GUC variable. > I personally would just ignore the duration two entries per statement > problem and make that the log analyzer software's job to fix, but I'm > likely biased here because I don't ever do anything with that data. We have basically 4 different cases: * log_min_duration_statement = 0: every query is logged with the duration on the same line, * log_statement = all: we don't have any duration, * log_statement = all & log_duration = on: we have every query and the duration on another line, * log_min_duration_statement = n > 0 & log_duration = on: we have duration lines for every query and statement + duration if the query is slower than n ms. There are other variants but I think they are the main cases to deal with. -- Guillaume
Guillaume Smet escribió: > On 2/19/07, Greg Smith <gsmith@gregsmith.com> wrote: > >log_destination = 'stderr,sql' # Valid values are combinations of > > # stderr, syslog, sql, and eventlog, > > # depending on platform. > > +1 Please don't do that. We already have a "combined GUC option" that is used to change two different things (DateStyle) and I regularly see people confused about how to use it. Also, "sql" is not really a destination -- it is a format. Maybe have a "format=plain,sql" GUC var is best: plain format is logged as currently, and SQL format is logged somewhere else (maybe use the same name, and stash a .sql suffix to the filename, or use .sql.log). That way you can choose to have one or the other, or both if you're really dumb. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Please don't do that. We already have a "combined GUC option" that is > used to change two different things (DateStyle) and I regularly see > people confused about how to use it. Perhaps I don't understand your "combined GUC option" but ISTM it's already the case for log_destination. I don't use this ability but the current comment in postgresql.conf seems to be self explanatory: log_destination = 'syslog' # Valid values are combinations of # stderr, syslog and eventlog, # depending on platform. > Also, "sql" is not really a destination -- it is a format. Maybe have a > "format=plain,sql" GUC var is best: plain format is logged as currently, > and SQL format is logged somewhere else (maybe use the same name, and > stash a .sql suffix to the filename, or use .sql.log). That way you can > choose to have one or the other, or both if you're really dumb. I don't see how you will deal with your format GUC variable. Or it won't be possible to have plain and sql at the same time? My problem is how you will separate plain and sql if you put them in the stderr or syslog. The configuration which may interest me at first is to have syslog plain text output to run pgFouine daily and from time to time use the SQL output (I won't be able to insert into a db my daily logs so I will do it sometimes if I need more details, ability to look for a particular query and so on). I don't know how other people plan to use it though. -- Guillaume
Guillaume Smet escribió: > On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >Please don't do that. We already have a "combined GUC option" that is > >used to change two different things (DateStyle) and I regularly see > >people confused about how to use it. > > Perhaps I don't understand your "combined GUC option" but ISTM it's > already the case for log_destination. I don't use this ability but the > current comment in postgresql.conf seems to be self explanatory: > log_destination = 'syslog' # Valid values are combinations of > # stderr, syslog and eventlog, > # depending on platform. That defines one thing: where does the log output go. It can be syslog, stderr, eventlog, or any combination thereof. But it's only one thing that you're configuring. In datestyle, you set two things: output format and input format. That's problematic because people changes one of the settings and then expect the other to change too, and get confused because it doesn't work. What I'm saying is that this was not a very good design and please let's not repeat the mistake here. > >Also, "sql" is not really a destination -- it is a format. Maybe have a > >"format=plain,sql" GUC var is best: plain format is logged as currently, > >and SQL format is logged somewhere else (maybe use the same name, and > >stash a .sql suffix to the filename, or use .sql.log). That way you can > >choose to have one or the other, or both if you're really dumb. > > I don't see how you will deal with your format GUC variable. Or it > won't be possible to have plain and sql at the same time? My problem > is how you will separate plain and sql if you put them in the stderr > or syslog. Well, to syslog or eventlog you don't send the SQL format, only plain. If you configure the SQL format, then it's expected that stderr is enabled, so it would be an error if you enable SQL format but not stderr. If redirect_stderr is also enabled then you could enable both plain format and SQL format, and have each logged to a different file (though I repeat that IMHO that would be a dumb thing to do). > The configuration which may interest me at first is to have syslog > plain text output to run pgFouine daily and from time to time use the > SQL output (I won't be able to insert into a db my daily logs so I > will do it sometimes if I need more details, ability to look for a > particular query and so on). > I don't know how other people plan to use it though. I don't think the syslog option is so nice to use these days; the redirect_stderr stuff is more powerful and easy to use. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I don't think the syslog option is so nice to use these days; the > redirect_stderr stuff is more powerful and easy to use. Did you already analyze logs of a highly loaded platform using stderr? It's impossible to guarantee the consistency of the queries because you can have: Query 1 Line 1 Query 2 Line 1 Query 3 Line 1 Query 1 Line 2 Query 2 Line 2 Query 1 Line 3 Query 3 Line 2 (each query is run on a different backend). With stderr you don't have sufficient information to know the query you should append the text to. Syslog adds useful context information (pid, command line, command number) which allows you to guarantee the consistency. It's a real problem for us and that's why I don't recommend using stderr when you use pgFouine. Moreover, syslog can send the log to the network which moves the I/O needed to log on another box and it's necessary in our case. -- Guillaume
Guillaume Smet escribió: > On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >I don't think the syslog option is so nice to use these days; the > >redirect_stderr stuff is more powerful and easy to use. > > Did you already analyze logs of a highly loaded platform using stderr? > It's impossible to guarantee the consistency of the queries because > you can have: > Query 1 Line 1 > Query 2 Line 1 > Query 3 Line 1 > Query 1 Line 2 > Query 2 Line 2 > Query 1 Line 3 > Query 3 Line 2 > (each query is run on a different backend). With stderr you don't have > sufficient information to know the query you should append the text > to. Syslog adds useful context information (pid, command line, command > number) which allows you to guarantee the consistency. So add the session ID (%c) to log_line_prefix. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > So add the session ID (%c) to log_line_prefix. It could work if log_line_prefix was added before every line but it's definitely not the case: myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT * FROM lieu LIMIT 10; if you execute: SELECT * FROM lieu LIMIT 10; I worked a *lot* on query logging and I really didn't find any solution to use stderr in a safe way, reason why I don't use it at all and I don't recommend it (even if pgFouine supports it, it outputs a warning as soon as the parser finds a multiline query). -- Guillaume
Guillaume Smet escribió: > On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >So add the session ID (%c) to log_line_prefix. > > It could work if log_line_prefix was added before every line but it's > definitely not the case: > myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT * > FROM lieu > LIMIT 10; > if you execute: > SELECT * > FROM lieu > LIMIT 10; Interesting. I wonder why didn't you report this as a bug before? Maybe we could have discussed it and fixed it. This is irrelevant in this particular discussion anyway if we introduce SQL format, because then the newline should be part of the COPY or INSERT data. Let's just make sure to not make the same errors again. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > (each query is run on a different backend). With stderr you don't have > sufficient information to know the query you should append the text > to. Syslog adds useful context information (pid, command line, command > number) which allows you to guarantee the consistency. > It's a real problem for us and that's why I don't recommend using > stderr when you use pgFouine. Of course, the other side of that coin is that syslog is known to drop messages altogether under sufficient load. (At least on some platforms; dunno about yours.) regards, tom lane
Tom Lane escribió: > "Guillaume Smet" <guillaume.smet@gmail.com> writes: > > (each query is run on a different backend). With stderr you don't have > > sufficient information to know the query you should append the text > > to. Syslog adds useful context information (pid, command line, command > > number) which allows you to guarantee the consistency. > > It's a real problem for us and that's why I don't recommend using > > stderr when you use pgFouine. > > Of course, the other side of that coin is that syslog is known to drop > messages altogether under sufficient load. (At least on some platforms; > dunno about yours.) So lets fix stderr by having a true per-line prefix? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 2/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Of course, the other side of that coin is that syslog is known to drop > messages altogether under sufficient load. (At least on some platforms; > dunno about yours.) Yes I know. That's one of the reason why I asked for the bahaviour of 7.4 log_duration back in 8.2. It's a good compromise which allows us not to lose lines and have a good level of information (at least, the best we can have). Async IO helps. Moreover we use syslog to send the log lines via UDP so we know that it's not perfect. But it works nice most of the time. We know that we can't log every query (we use a combination of log_duration and log_min_duration_statement - I patched 8.1 for that) because if we do so we lose a lot of lines and queries are not consistent but we can't do it locally with stderr anyway due to I/O. This method has been reliable for more than a year and our daily reports are consistent. -- Guillaume
Ok..... Summarizing the key changes required on my patch, based on the discussions so far are : - The log_destination will include a new option 'sql'. This can be given with other combinations of stderr, syslog or eventlog. - The sql logs will be written in log_directory in a file log_filename.SQL - The log output will be in COPY format and will include the following information, irrespective of the log_line_prefix setting. ( timestamp_with_milliseconds, timestamp, username, databasename, sessionid, host_and_port, host, proc_id, command_tag, session_start, transaction_id, error_severity, SQL_State_Code, statement/error_message); Anything else missing ? ? Rgds, Arul Shaji On Tue, 20 Feb 2007 19:03, Guillaume Smet wrote: > On 2/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Of course, the other side of that coin is that syslog is known to drop > > messages altogether under sufficient load. (At least on some platforms; > > dunno about yours.) > > Yes I know. That's one of the reason why I asked for the bahaviour of > 7.4 log_duration back in 8.2. It's a good compromise which allows us > not to lose lines and have a good level of information (at least, the > best we can have). Async IO helps. > > Moreover we use syslog to send the log lines via UDP so we know that > it's not perfect. But it works nice most of the time. We know that we > can't log every query (we use a combination of log_duration and > log_min_duration_statement - I patched 8.1 for that) because if we do > so we lose a lot of lines and queries are not consistent but we can't > do it locally with stderr anyway due to I/O. This method has been > reliable for more than a year and our daily reports are consistent. This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > - The log output will be in COPY format and will include the following > information, irrespective of the log_line_prefix setting. > ( timestamp_with_milliseconds, �timestamp, username, �databasename, > sessionid, �host_and_port, host, proc_id, command_tag, �session_start, > transaction_id, �error_severity, �SQL_State_Code, statement/error_message); How exactly are you fitting the message structure (primary/detail/context lines) into this? It looks like your proposal loses that structure ... A smaller problem is that this forces people to incur a gettimeofday call for every message logged; depending on your hardware that can be a pretty nasty overhead. Some people might find some of the other columns not worth their weight, either. Is it worth providing a knob to determine the set of columns emitted? regards, tom lane
On Wed, 21 Feb 2007 12:08, Tom Lane wrote: > "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes: > > - The log output will be in COPY format and will include the following > > information, irrespective of the log_line_prefix setting. > > ( timestamp_with_milliseconds, timestamp, username, databasename, > > sessionid, host_and_port, host, proc_id, command_tag, session_start, > > transaction_id, error_severity, SQL_State_Code, > > statement/error_message); > > How exactly are you fitting the message structure > (primary/detail/context lines) into this? It looks like your proposal > loses that structure ... Sorry, didn't understand.... Can you please elaborate ? > A smaller problem is that this forces people to incur a gettimeofday > call for every message logged; depending on your hardware that can be a > pretty nasty overhead. Some people might find some of the other columns > not worth their weight, either. Is it worth providing a knob to > determine the set of columns emitted? Totally agree. My original patch infact uses log_line_prefix. So the user can fill in the columns he wants by turning on appropriate settings in log_line_prefix. The columns which he hasn't turned on will be output as NULL in the sql outptut. But I can also see merit in others' request that it is ideal to have all possible info in the sql log, so that once the log is loaded into the table, it can be queried, sub-tabled, created view to analyze it in whatever way..... Rgds, Arul Shaji > > regards, tom lane This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
On Tue, 20 Feb 2007, Tom Lane wrote: > A smaller problem is that this forces people to incur a gettimeofday > call for every message logged I'm stumped trying to think of an application that would require importing the logs into a database to analyze them, but not need the timestamp. I'd expect it to be the primary key on the data. > Is it worth providing a knob to determine the set of columns emitted? Myself and Guillaume felt that having the format be standardized had significant value from a downstream application perspective; it would be nice to know that everyone can work together to write one simple tool chain to process these things and it would work everywhere. The current level of log output customization is part of what makes log analysis tools so much of a pain. How about this as a simple way to proceed: have the patch include everything, as Arul already planned. When it's done, do some benchmarking with it turned on or off. If it really seems like a drag, then consider a GUC addition to trim it down. Why optimize prematurely? It's not like this will be on by default. My guess is that the person sophisticated to analyze their logs probably has an installation that can support the overhead. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 21 Feb 2007 14:59, Greg Smith wrote: > On Tue, 20 Feb 2007, Tom Lane wrote: > > A smaller problem is that this forces people to incur a gettimeofday > > call for every message logged > > I'm stumped trying to think of an application that would require importing > the logs into a database to analyze them, but not need the timestamp. > I'd expect it to be the primary key on the data. > > > Is it worth providing a knob to determine the set of columns emitted? > > Myself and Guillaume felt that having the format be standardized had > significant value from a downstream application perspective; it would be Come to think of it, this may not be ideal after all. As we are triggering the sql output in log_destination, if the user gives 'syslog,sql' as options he is going to get two different looking logs (in terms of contents) depending upon his settings. But if we take the settings from log_line_prefix then the log contents are the same, plus it gives the user flexibility to control what he wants. If an user wants everything he only has to fill the log_line_prefix completely. Also, for a meaningful sql log output we may need to tell the user not to turn on verbose or print_plan or statistics etc... With a uniform log output it will be clear in that sense.. What he sets in .conf is what he gets, both in syslog and sql log. This may not be an optimization. Only an option which is there if any optimization is necessary. I am happy to implement it either way though. My requirement is same as yours. I want some sort of sql logging, pronto. Rgds, Arul Shaji > nice to know that everyone can work together to write one simple tool > chain to process these things and it would work everywhere. The current > level of log output customization is part of what makes log analysis tools > so much of a pain. > > How about this as a simple way to proceed: have the patch include > everything, as Arul already planned. When it's done, do some benchmarking > with it turned on or off. If it really seems like a drag, then consider a > GUC addition to trim it down. Why optimize prematurely? It's not like > this will be on by default. My guess is that the person sophisticated to > analyze their logs probably has an installation that can support the > overhead. This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au
On Thu, Feb 22, 2007 at 11:50:06AM +1100, FAST PostgreSQL wrote: > - The log output will be in COPY format and will include the following > information, irrespective of the log_line_prefix setting. > ( timestamp_with_milliseconds, timestamp, username, databasename, > sessionid, host_and_port, host, proc_id, command_tag, session_start, > transaction_id, error_severity, SQL_State_Code, statement/error_message); Why are there two timestamps? What about durations? Do they include all this? Is the statement log written when the statement starts or when it completes? -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
On Thu, 22 Feb 2007, FAST PostgreSQL wrote: > As we are triggering the sql output in log_destination, if the user > gives 'syslog,sql' as options he is going to get two different looking > logs (in terms of contents) depending upon his settings. Yes, exactly; it's a good thing. People add and remove things from the text logs to make them easier to read. It's one of the reasons they're harder to process. Since readability isn't a requirement for the SQL formatted ones, you can pack a lot more into there and make it available easily anyway. I keep having every part of this conversation twice, so here's take two on this one. The things that people want out of the text logs are not necessarily the same things they want from the SQL ones. For example, I have a situation where the semantics of the syslog output is being driven by Sarbanes-Oxley related mechanics. But the SQL logs are be based on my requirements, which is to include close enough to everything that it might as well be the whole set, in case I forgot something I find I need later. The SQL logs are *completely* different from the syslog setup. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
What is the status of this patch? --------------------------------------------------------------------------- FAST PostgreSQL wrote: > Hi, > > I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet completelyfunctional) > > "Allow server log information to be output as INSERT statements > This would allow server log information to be easily loaded into a database for analysis. " > > I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts. > > What is done so far > --------------- > > Two postgresql.conf variables > > #log_output_type = 'text' #Valid values are 'SQL' or 'text' > #log_output_table_name = 'auditlogs' > > These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output asINSERT commands. > > The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdbto create one. > > The proposed log output structure > ------------------ > INSERT INTO user_defined_table values( timestamp_with_milliseconds, timestamp, username, databasename, sessionid, host_and_port,host, proc_id, command_tag, session_start, transaction_id, error_severity, SQL_State_Code, error_message); > > All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these columnswill have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in appropriatecolumns. The code has been modified to do > this. > > Issues/Questions are: > - How about 'Statement duration log'. This will come to the logger as a single string and after the query execution. Inthe existing log we can make sense of the duration log by matching it with the statement above it or by the statement whichgets printed besides it (Again as > a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row. (Mypreferred option is to add another column to the table structure defined above as 'duration'. But haven't figured outhow to achieve this, because the > statement is printed first and then the duration as another log.) > > - If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current syslogwriter function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other informationbefore outputting it, which > cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log outputis best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose andvarious other statistics log. > > - There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE ........'.For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both textand SQL outputs ? > > Rgds, > Arul Shaji > > This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive thisemail in error, please return to sender. Thank you. > > If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please emailunsubscribe@fast.fujitsu.com.au [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +