Thread: WIP patch - INSERT-able log statements

WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
Hi,

I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet
completelyfunctional) 

"Allow server log information to be output as INSERT statements
This would allow server log information to be easily loaded into a database for analysis. "

I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts.

What is done so far
---------------

Two postgresql.conf variables

#log_output_type = 'text'        #Valid values are 'SQL' or 'text'
#log_output_table_name = 'auditlogs'

These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output
asINSERT commands. 

The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the
userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make
initdbto create one.   

The proposed log output structure
------------------
INSERT INTO user_defined_table values( timestamp_with_milliseconds,  timestamp, username,  databasename, sessionid,
host_and_port,host, proc_id, command_tag,  session_start, transaction_id,  error_severity,  SQL_State_Code,
error_message);

All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these
columnswill have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in
appropriatecolumns. The code has been modified to do  
this.

Issues/Questions are:
- How about 'Statement duration log'.  This will come to the logger as a single string and after the query execution.
Inthe existing log we can make sense of the duration log by matching it with the statement above it or by the statement
whichgets printed besides it (Again as  
a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row.
(Mypreferred option is to add another column to the table structure defined above as 'duration'. But haven't figured
outhow to achieve this, because the  
statement is printed first and then the duration as another log.)

- If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current
syslogwriter function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other
informationbefore outputting it, which  
cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log
outputis best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose
andvarious other statistics log.  

- There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE
........'.For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both
textand SQL outputs ? 

Rgds,
Arul Shaji

This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 

Attachment

Re: WIP patch - INSERT-able log statements

From
Tom Lane
Date:
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the
userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make
initdbto create one.   

Why not output the data in COPY format instead?  That (a) eliminates the
problem of needing to predetermine a destination table name, and (b)
should be considerably faster to load than thousands of INSERT statements.

            regards, tom lane

Re: WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> > The second variable is of interest. We need to specify a table in the
> > insert command. My preferred option is for the user to give one and he
> > can create it if and when he wants to. The alternative is we decide the
> > table name and make initdb to create one.
>
> Why not output the data in COPY format instead?  That (a) eliminates the
> problem of needing to predetermine a destination table name, and (b)
> should be considerably faster to load than thousands of INSERT statements.

Yeah, that was my initial idea too... But because the TODO item clearly
mentions INSERT, I thought maybe there was some very specific reason for the
output to be in INSERT stmts.. ..

Rgds,
Arul Shaji



>
>             regards, tom lane
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 


Re: WIP patch - INSERT-able log statements

From
Tom Lane
Date:
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
>> Why not output the data in COPY format instead?

> Yeah, that was my initial idea too... But because the TODO item clearly
> mentions INSERT, I thought maybe there was some very specific reason for the
> output to be in INSERT stmts.. ..

I don't think the TODO entry was particularly carefully thought out.

            regards, tom lane

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> > On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
> >> Why not output the data in COPY format instead?
>
> > Yeah, that was my initial idea too... But because the TODO item clearly
> > mentions INSERT, I thought maybe there was some very specific reason for the
> > output to be in INSERT stmts.. ..
>
> I don't think the TODO entry was particularly carefully thought out.

... which is why discussing stuff in -hackers *before* coding tends to
be a good idea.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WIP patch - INSERT-able log statements

From
"Joshua D. Drake"
Date:
FAST PostgreSQL wrote:
> On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
>> "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
>>> The second variable is of interest. We need to specify a table in the
>>> insert command. My preferred option is for the user to give one and he
>>> can create it if and when he wants to. The alternative is we decide the
>>> table name and make initdb to create one.
>> Why not output the data in COPY format instead?  That (a) eliminates the
>> problem of needing to predetermine a destination table name, and (b)
>> should be considerably faster to load than thousands of INSERT statements.
>
> Yeah, that was my initial idea too... But because the TODO item clearly
> mentions INSERT, I thought maybe there was some very specific reason for the
> output to be in INSERT stmts.. ..

COPY would be a good option, but INSERT is probably what I would use as
the default. The most use I see for this is something where I am tailing
out the log and inserting live into a log db...

Sincerely,

Joshua D. Drake

>
> Rgds,
> Arul Shaji
>
>
>
>>             regards, tom lane
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 
>
> If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: WIP patch - INSERT-able log statements

From
Greg Smith
Date:
On Sat, 17 Feb 2007, FAST PostgreSQL wrote:

> #log_output_type = 'text' #Valid values are 'SQL' or 'text'
> Defaults to 'text' which is status quo. If it is set to 'SQL' log will
> be output as INSERT commands.

This assumes someone wants either the INSERT-able logs or the current,
easily readable ones.  I know I don't want either--I want both.  There are
times I want to look through the logs with a text editor, there are times
where I want to query against them.

I would suggest treating this similarly to how the Windows eventlog is
handled:  made SQL INSERT format another option available to
log_destination, so it can be combined with the existing formats.  In
addition to the syslog concerns you already mentioned (which are
themselves a showstopper for using this feature in companies that rely on
or aggregate syslogs), I know I'd want to keep the existing logs rolling
in parallel while I tested out the SQL-based version for a while, before
cutting over to exclusively INSERT format logs.

I've thought a bit about how to implement this TODO already (I have a log
file parser and I hate maintaining it), and the only thing that made sense
to me was giving a new parameter with the filename to output to in this
format.  For example, make a new log_sql_filename with the same syntax
already used for log_filename.  There will probably need to be a second
parameter for the table name to insert into as you've already commented
on.  And like Joshua has already suggested, the main useful applications
for this feature I've thought of all involve reading from the INSERT-able
logs in real-time, using something like "tail -f", and pumping that data
immediately into a logger table.

Also, I feel that supporting the whole log_line_prefix syntax for this
feature is not just overkill, it's a bad idea.  Output everything in a
standard, complete format instead, and then it becomes easy for the
community at large to build tools on top of that to analyze the log
database entries instead of having so many ad-hoc approaches.  You want a
subset, use a view or copy just the fields you want into another table.
I would guess this simplifies the patch as well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
Added -hackers to CC:.

On 2/18/07, Greg Smith <gsmith@gregsmith.com> wrote:
> I've thought a bit about how to implement this TODO already (I have a log
> file parser and I hate maintaining it)

Any problem using pgFouine?

> Also, I feel that supporting the whole log_line_prefix syntax for this
> feature is not just overkill, it's a bad idea.  Output everything in a
> standard, complete format instead, and then it becomes easy for the
> community at large to build tools on top of that to analyze the log
> database entries instead of having so many ad-hoc approaches.  You want a
> subset, use a view or copy just the fields you want into another table.
> I would guess this simplifies the patch as well.

I agree with you on this point. We need every information we can put
in the database (database, user and so on) in a structured way.

Logging statements is not the only thing to do IMHO. Logging errors in
another table is important too. I'm pretty sure there are a lot people
who don't know if there are errors in their statements.

Anyway, if something is developed to do that, I'll extend pgFouine to
support it (or I'll write another application to do it) and I'm pretty
sure others will do. The most annoying and difficult part of the work
on pgFouine/PQA/... is to maintain the parser. If we don't have to do
that anymore, we'll have more time to think about new features.
In fact, it was a long plan for pgFouine to be able to store the
results of its analysis in a database so it's a good news if it's
implemented in core.

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
On Sun, 18 Feb 2007 18:25, Greg Smith wrote:
> On Sat, 17 Feb 2007, FAST PostgreSQL wrote:
> > #log_output_type = 'text' #Valid values are 'SQL' or 'text'
> > Defaults to 'text' which is status quo. If it is set to 'SQL' log will
> > be output as INSERT commands.
>
> This assumes someone wants either the INSERT-able logs or the current,
> easily readable ones.  I know I don't want either--I want both.  There are
> times I want to look through the logs with a text editor, there are times
> where I want to query against them.
>
> I would suggest treating this similarly to how the Windows eventlog is
> handled:  made SQL INSERT format another option available to
> log_destination, so it can be combined with the existing formats.  In

I think adding the 'format' of the log as an option in the 'destination' may
be confusing.  We can have a new boolean variable like 'output_sql_log' or
'log_sql_format' which will trigger the output of INSERT-able log in addition
to syslog/stderr/eventlog in text format as it is now.

> addition to the syslog concerns you already mentioned (which are
> themselves a showstopper for using this feature in companies that rely on
> or aggregate syslogs), I know I'd want to keep the existing logs rolling
> in parallel while I tested out the SQL-based version for a while, before
> cutting over to exclusively INSERT format logs.
>
> I've thought a bit about how to implement this TODO already (I have a log

Any thoughts on how to output the duration in the same INSERT as that of the
query without having to output two entries per statement as it is done in
text now? ?

> file parser and I hate maintaining it), and the only thing that made sense
> to me was giving a new parameter with the filename to output to in this
> format.  For example, make a new log_sql_filename with the same syntax
> already used for log_filename.  There will probably need to be a second
> parameter for the table name to insert into as you've already commented
> on.  And like Joshua has already suggested, the main useful applications

This would mean that the user will have to maintain two sets of variable. One
for text output (if redirect stderr is on) and another for sql output. My
preference would be for the sql-logs to use the same variables as that of
redirect_stderr. In the directory mentioned by the user on log_directory we
just output the sql logs in a file with an '.SQL' appended to the filename
specified by the user in log_filename. This also means we can make use of the
log_truncation and log_rotate options (I hope. I will have a look at that).
Of course we document this making user aware of this. (The other option is we
output the logs in a subdirectory called 'SQL' which we can silently create,
if we are allowed to ! ! !)

This also means only two new variable is created in postgresql.conf for this
feature. To trigger the output and the table name. (Or we can introduce only
the table name variable which will automatically trigger the SQL log output.
Maybe confusing ? ? )

Rgds,
Arul Shaji


> for this feature I've thought of all involve reading from the INSERT-able
> logs in real-time, using something like "tail -f", and pumping that data
> immediately into a logger table.
>
> Also, I feel that supporting the whole log_line_prefix syntax for this
> feature is not just overkill, it's a bad idea.  Output everything in a
> standard, complete format instead, and then it becomes easy for the
> community at large to build tools on top of that to analyze the log
> database entries instead of having so many ad-hoc approaches.  You want a
> subset, use a view or copy just the fields you want into another table.
> I would guess this simplifies the patch as well.


This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 


Re: WIP patch - INSERT-able log statements

From
Greg Smith
Date:
On Tue, 20 Feb 2007, FAST PostgreSQL wrote:

> I think adding the 'format' of the log as an option in the 'destination'
> may be confusing.  We can have a new boolean variable like
> 'output_sql_log' or 'log_sql_format' which will trigger the output of
> INSERT-able log in addition to syslog/stderr/eventlog in text format as
> it is now.

What's confusing about it?  Consider this hypothetical postgresql.conf
snippet:

log_destination = 'stderr,sql'     # Valid values are combinations of
                     # stderr, syslog, sql, and eventlog,
                     # depending on platform.

# These are relevant when logging to sql:
log_sql_table = 'pg_log'  # Table SQL formatted logs INSERT into
                     # Default is 'pg_log'

Adding new GUC variables isn't without cost; no reason to add a new one
when there's a perfectly good destination one already whose syntax is
completely appropriate for this task.

I personally would just ignore the duration two entries per statement
problem and make that the log analyzer software's job to fix, but I'm
likely biased here because I don't ever do anything with that data.

> My preference would be for the sql-logs to use the same variables as
> that of redirect_stderr. In the directory mentioned by the user on
> log_directory we just output the sql logs in a file with an '.SQL'
> appended to the filename specified by the user in log_filename. This
> also means we can make use of the log_truncation and log_rotate options
> (I hope. I will have a look at that).

You're talking along the same lines here I was trying to suggest in my
previous message.  Keep all the current behavior as far as truncation and
rotation go, just provide a different name for the file.  If you just
appended a suffix like .sql to the existing name, that would remove
another GUC variable requirement.  I think eventually people will complain
about this, and want a separately formatted filename altogether, but
there's nothing wrong with the approach you suggest for a first version of
this feature.  I know I'm so desparate for a better solution to log
importing that I'd be happy with any workable improvement.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
On 2/19/07, Greg Smith <gsmith@gregsmith.com> wrote:
> log_destination = 'stderr,sql'     # Valid values are combinations of
>                      # stderr, syslog, sql, and eventlog,
>                      # depending on platform.

+1

> # These are relevant when logging to sql:
> log_sql_table = 'pg_log'  # Table SQL formatted logs INSERT into
>                      # Default is 'pg_log'

Is there really a need for that? Why not simply put something like
%log_table% in the sql file and let the admin replace it with sed or
whatever he likes?
And it could allow us to move to copy without having to drop the code
added to manage the new GUC variable.

> I personally would just ignore the duration two entries per statement
> problem and make that the log analyzer software's job to fix, but I'm
> likely biased here because I don't ever do anything with that data.

We have basically 4 different cases:
* log_min_duration_statement = 0: every query is logged with the
duration on the same line,
* log_statement = all: we don't have any duration,
* log_statement = all & log_duration = on: we have every query and the
duration on another line,
* log_min_duration_statement = n > 0 & log_duration = on: we have
duration lines for every query and statement + duration if the query
is slower than n ms.

There are other variants but I think they are the main cases to deal with.

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Guillaume Smet escribió:
> On 2/19/07, Greg Smith <gsmith@gregsmith.com> wrote:
> >log_destination = 'stderr,sql'     # Valid values are combinations of
> >                     # stderr, syslog, sql, and eventlog,
> >                     # depending on platform.
>
> +1

Please don't do that.  We already have a "combined GUC option" that is
used to change two different things (DateStyle) and I regularly see
people confused about how to use it.

Also, "sql" is not really a destination -- it is a format.  Maybe have a
"format=plain,sql" GUC var is best: plain format is logged as currently,
and SQL format is logged somewhere else (maybe use the same name, and
stash a .sql suffix to the filename, or use .sql.log).  That way you can
choose to have one or the other, or both if you're really dumb.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Please don't do that.  We already have a "combined GUC option" that is
> used to change two different things (DateStyle) and I regularly see
> people confused about how to use it.

Perhaps I don't understand your "combined GUC option" but ISTM it's
already the case for log_destination. I don't use this ability but the
current comment in postgresql.conf seems to be self explanatory:
log_destination = 'syslog'              # Valid values are combinations of
                                        # stderr, syslog and eventlog,
                                        # depending on platform.

> Also, "sql" is not really a destination -- it is a format.  Maybe have a
> "format=plain,sql" GUC var is best: plain format is logged as currently,
> and SQL format is logged somewhere else (maybe use the same name, and
> stash a .sql suffix to the filename, or use .sql.log).  That way you can
> choose to have one or the other, or both if you're really dumb.

I don't see how you will deal with your format GUC variable. Or it
won't be possible to have plain and sql at the same time? My problem
is how you will separate plain and sql if you put them in the stderr
or syslog.

The configuration which may interest me at first is to have syslog
plain text output to run pgFouine daily and from time to time use the
SQL output (I won't be able to insert into a db my daily logs so I
will do it sometimes if I need more details, ability to look for a
particular query and so on).
I don't know how other people plan to use it though.

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >Please don't do that.  We already have a "combined GUC option" that is
> >used to change two different things (DateStyle) and I regularly see
> >people confused about how to use it.
>
> Perhaps I don't understand your "combined GUC option" but ISTM it's
> already the case for log_destination. I don't use this ability but the
> current comment in postgresql.conf seems to be self explanatory:
> log_destination = 'syslog'              # Valid values are combinations of
>                                        # stderr, syslog and eventlog,
>                                        # depending on platform.

That defines one thing: where does the log output go.  It can be syslog,
stderr, eventlog, or any combination thereof.  But it's only one thing
that you're configuring.

In datestyle, you set two things: output format and input format.
That's problematic because people changes one of the settings and then
expect the other to change too, and get confused because it doesn't
work.  What I'm saying is that this was not a very good design and
please let's not repeat the mistake here.

> >Also, "sql" is not really a destination -- it is a format.  Maybe have a
> >"format=plain,sql" GUC var is best: plain format is logged as currently,
> >and SQL format is logged somewhere else (maybe use the same name, and
> >stash a .sql suffix to the filename, or use .sql.log).  That way you can
> >choose to have one or the other, or both if you're really dumb.
>
> I don't see how you will deal with your format GUC variable. Or it
> won't be possible to have plain and sql at the same time? My problem
> is how you will separate plain and sql if you put them in the stderr
> or syslog.

Well, to syslog or eventlog you don't send the SQL format, only plain.
If you configure the SQL format, then it's expected that stderr is
enabled, so it would be an error if you enable SQL format but not
stderr.  If redirect_stderr is also enabled then you could enable both
plain format and SQL format, and have each logged to a different file
(though I repeat that IMHO that would be a dumb thing to do).


> The configuration which may interest me at first is to have syslog
> plain text output to run pgFouine daily and from time to time use the
> SQL output (I won't be able to insert into a db my daily logs so I
> will do it sometimes if I need more details, ability to look for a
> particular query and so on).
> I don't know how other people plan to use it though.

I don't think the syslog option is so nice to use these days; the
redirect_stderr stuff is more powerful and easy to use.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> I don't think the syslog option is so nice to use these days; the
> redirect_stderr stuff is more powerful and easy to use.

Did you already analyze logs of a highly loaded platform using stderr?
It's impossible to guarantee the consistency of the queries because
you can have:
Query 1 Line 1
Query 2 Line 1
Query 3 Line 1
Query 1 Line 2
Query 2 Line 2
Query 1 Line 3
Query 3 Line 2
(each query is run on a different backend). With stderr you don't have
sufficient information to know the query you should append the text
to. Syslog adds useful context information (pid, command line, command
number) which allows you to guarantee the consistency.
It's a real problem for us and that's why I don't recommend using
stderr when you use pgFouine.
Moreover, syslog can send the log to the network which moves the I/O
needed to log on another box and it's necessary in our case.

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >I don't think the syslog option is so nice to use these days; the
> >redirect_stderr stuff is more powerful and easy to use.
>
> Did you already analyze logs of a highly loaded platform using stderr?
> It's impossible to guarantee the consistency of the queries because
> you can have:
> Query 1 Line 1
> Query 2 Line 1
> Query 3 Line 1
> Query 1 Line 2
> Query 2 Line 2
> Query 1 Line 3
> Query 3 Line 2
> (each query is run on a different backend). With stderr you don't have
> sufficient information to know the query you should append the text
> to. Syslog adds useful context information (pid, command line, command
> number) which allows you to guarantee the consistency.

So add the session ID (%c) to log_line_prefix.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> So add the session ID (%c) to log_line_prefix.

It could work if log_line_prefix was added before every line but it's
definitely not the case:
myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
        FROM lieu
        LIMIT 10;
if you execute:
SELECT *
FROM lieu
LIMIT 10;

I worked a *lot* on query logging and I really didn't find any
solution to use stderr in a safe way, reason why I don't use it at all
and I don't recommend it (even if pgFouine supports it, it outputs a
warning as soon as the parser finds a multiline query).

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >So add the session ID (%c) to log_line_prefix.
>
> It could work if log_line_prefix was added before every line but it's
> definitely not the case:
> myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
>        FROM lieu
>        LIMIT 10;
> if you execute:
> SELECT *
> FROM lieu
> LIMIT 10;

Interesting.  I wonder why didn't you report this as a bug before?
Maybe we could have discussed it and fixed it.

This is irrelevant in this particular discussion anyway if we introduce
SQL format, because then the newline should be part of the COPY or INSERT
data.  Let's just make sure to not make the same errors again.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: WIP patch - INSERT-able log statements

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> (each query is run on a different backend). With stderr you don't have
> sufficient information to know the query you should append the text
> to. Syslog adds useful context information (pid, command line, command
> number) which allows you to guarantee the consistency.
> It's a real problem for us and that's why I don't recommend using
> stderr when you use pgFouine.

Of course, the other side of that coin is that syslog is known to drop
messages altogether under sufficient load.  (At least on some platforms;
dunno about yours.)

            regards, tom lane

Re: WIP patch - INSERT-able log statements

From
Alvaro Herrera
Date:
Tom Lane escribió:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
> > (each query is run on a different backend). With stderr you don't have
> > sufficient information to know the query you should append the text
> > to. Syslog adds useful context information (pid, command line, command
> > number) which allows you to guarantee the consistency.
> > It's a real problem for us and that's why I don't recommend using
> > stderr when you use pgFouine.
>
> Of course, the other side of that coin is that syslog is known to drop
> messages altogether under sufficient load.  (At least on some platforms;
> dunno about yours.)

So lets fix stderr by having a true per-line prefix?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WIP patch - INSERT-able log statements

From
"Guillaume Smet"
Date:
On 2/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Of course, the other side of that coin is that syslog is known to drop
> messages altogether under sufficient load.  (At least on some platforms;
> dunno about yours.)

Yes I know. That's one of the reason why I asked for the bahaviour of
7.4 log_duration back in 8.2. It's a good compromise which allows us
not to lose lines and have a good level of information (at least, the
best we can have). Async IO helps.

Moreover we use syslog to send the log lines via UDP so we know that
it's not perfect. But it works nice most of the time. We know that we
can't log every query (we use a combination of log_duration and
log_min_duration_statement - I patched 8.1 for that) because if we do
so we lose a lot of lines and queries are not consistent but we can't
do it locally with stderr anyway due to I/O. This method has been
reliable for more than a year and our daily reports are consistent.

--
Guillaume

Re: WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
Ok..... Summarizing the key changes required on my patch, based on the
discussions so far are :

- The log_destination will include a new option 'sql'. This can be given with
other combinations of stderr, syslog or eventlog.
- The sql logs will be written in log_directory in a file log_filename.SQL
- The log output will be in COPY format and will include the following
information, irrespective of the log_line_prefix setting.
( timestamp_with_milliseconds,  timestamp, username,  databasename,
sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
transaction_id,  error_severity,  SQL_State_Code, statement/error_message);

Anything else missing ? ?

Rgds,
Arul Shaji


On Tue, 20 Feb 2007 19:03, Guillaume Smet wrote:
> On 2/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Of course, the other side of that coin is that syslog is known to drop
> > messages altogether under sufficient load.  (At least on some platforms;
> > dunno about yours.)
>
> Yes I know. That's one of the reason why I asked for the bahaviour of
> 7.4 log_duration back in 8.2. It's a good compromise which allows us
> not to lose lines and have a good level of information (at least, the
> best we can have). Async IO helps.
>
> Moreover we use syslog to send the log lines via UDP so we know that
> it's not perfect. But it works nice most of the time. We know that we
> can't log every query (we use a combination of log_duration and
> log_min_duration_statement - I patched 8.1 for that) because if we do
> so we lose a lot of lines and queries are not consistent but we can't
> do it locally with stderr anyway due to I/O. This method has been
> reliable for more than a year and our daily reports are consistent.

This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 


Re: WIP patch - INSERT-able log statements

From
Tom Lane
Date:
"FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> - The log output will be in COPY format and will include the following
> information, irrespective of the log_line_prefix setting.
> ( timestamp_with_milliseconds, �timestamp, username, �databasename,
> sessionid, �host_and_port, host, proc_id, command_tag, �session_start,
> transaction_id, �error_severity, �SQL_State_Code, statement/error_message);

How exactly are you fitting the message structure
(primary/detail/context lines) into this?  It looks like your proposal
loses that structure ...

A smaller problem is that this forces people to incur a gettimeofday
call for every message logged; depending on your hardware that can be a
pretty nasty overhead.  Some people might find some of the other columns
not worth their weight, either.  Is it worth providing a knob to
determine the set of columns emitted?

            regards, tom lane

Re: WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
On Wed, 21 Feb 2007 12:08, Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs@fast.fujitsu.com.au> writes:
> > - The log output will be in COPY format and will include the following
> > information, irrespective of the log_line_prefix setting.
> > ( timestamp_with_milliseconds,  timestamp, username,  databasename,
> > sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
> > transaction_id,  error_severity,  SQL_State_Code,
> > statement/error_message);
>
> How exactly are you fitting the message structure
> (primary/detail/context lines) into this?  It looks like your proposal
> loses that structure ...

Sorry, didn't understand.... Can you please elaborate ?

> A smaller problem is that this forces people to incur a gettimeofday
> call for every message logged; depending on your hardware that can be a
> pretty nasty overhead.  Some people might find some of the other columns
> not worth their weight, either.  Is it worth providing a knob to
> determine the set of columns emitted?

Totally agree. My original patch infact uses log_line_prefix. So the user can
fill in the columns he wants by turning on appropriate settings in
log_line_prefix. The columns which he hasn't turned on will be output as NULL
in the sql outptut.

But I can also see merit in others' request that it is ideal to have all
possible info in the sql log, so that once the log is loaded into the table,
it can be queried, sub-tabled, created view to analyze it in whatever way.....

Rgds,
Arul Shaji


>
>             regards, tom lane
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 


Re: WIP patch - INSERT-able log statements

From
Greg Smith
Date:
On Tue, 20 Feb 2007, Tom Lane wrote:

> A smaller problem is that this forces people to incur a gettimeofday
> call for every message logged

I'm stumped trying to think of an application that would require importing
the logs into a database to analyze them, but not need the timestamp.
I'd expect it to be the primary key on the data.

> Is it worth providing a knob to determine the set of columns emitted?

Myself and Guillaume felt that having the format be standardized had
significant value from a downstream application perspective; it would be
nice to know that everyone can work together to write one simple tool
chain to process these things and it would work everywhere.  The current
level of log output customization is part of what makes log analysis tools
so much of a pain.

How about this as a simple way to proceed:  have the patch include
everything, as Arul already planned.  When it's done, do some benchmarking
with it turned on or off.  If it really seems like a drag, then consider a
GUC addition to trim it down.  Why optimize prematurely?  It's not like
this will be on by default. My guess is that the person sophisticated to
analyze their logs probably has an installation that can support the
overhead.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WIP patch - INSERT-able log statements

From
"FAST PostgreSQL"
Date:
On Wed, 21 Feb 2007 14:59, Greg Smith wrote:
> On Tue, 20 Feb 2007, Tom Lane wrote:
> > A smaller problem is that this forces people to incur a gettimeofday
> > call for every message logged
>
> I'm stumped trying to think of an application that would require importing
> the logs into a database to analyze them, but not need the timestamp.
> I'd expect it to be the primary key on the data.
>
> > Is it worth providing a knob to determine the set of columns emitted?
>
> Myself and Guillaume felt that having the format be standardized had
> significant value from a downstream application perspective; it would be

Come to think of it, this may not be ideal after all. As we are triggering
the sql output in log_destination, if the user gives 'syslog,sql' as options
he is going to get two different looking logs (in terms of contents)
depending upon his settings.

But if we take the settings from log_line_prefix then the log contents are
the same, plus it gives the user flexibility to control what he wants. If an
user wants everything he only has to fill the log_line_prefix completely.

Also, for a meaningful sql log output we may need to tell the user not to
turn on verbose or print_plan or statistics etc...  With a uniform log output
it will be clear in that sense.. What he sets in .conf is what he gets, both
in syslog and sql log. This may not be an optimization. Only an option which
is there if any optimization is necessary.

I am happy to implement it either way though. My requirement is same as
yours. I want some sort of sql logging, pronto.

Rgds,
Arul Shaji


> nice to know that everyone can work together to write one simple tool
> chain to process these things and it would work everywhere.  The current
> level of log output customization is part of what makes log analysis tools
> so much of a pain.
>
> How about this as a simple way to proceed:  have the patch include
> everything, as Arul already planned.  When it's done, do some benchmarking
> with it turned on or off.  If it really seems like a drag, then consider a
> GUC addition to trim it down.  Why optimize prematurely?  It's not like
> this will be on by default. My guess is that the person sophisticated to
> analyze their logs probably has an installation that can support the
> overhead.
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 


Re: WIP patch - INSERT-able log statements

From
daveg
Date:
On Thu, Feb 22, 2007 at 11:50:06AM +1100, FAST PostgreSQL wrote:
> - The log output will be in COPY format and will include the following
> information, irrespective of the log_line_prefix setting.
> ( timestamp_with_milliseconds,  timestamp, username,  databasename,
> sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
> transaction_id,  error_severity,  SQL_State_Code, statement/error_message);

Why are there two timestamps?

What about durations? Do they include all this?

Is the statement log written when the statement starts or when it completes?

-dg

--
David Gould                      daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: WIP patch - INSERT-able log statements

From
Greg Smith
Date:
On Thu, 22 Feb 2007, FAST PostgreSQL wrote:

> As we are triggering the sql output in log_destination, if the user
> gives 'syslog,sql' as options he is going to get two different looking
> logs (in terms of contents) depending upon his settings.

Yes, exactly; it's a good thing.  People add and remove things from the
text logs to make them easier to read.  It's one of the reasons they're
harder to process.  Since readability isn't a requirement for the SQL
formatted ones, you can pack a lot more into there and make it available
easily anyway.

I keep having every part of this conversation twice, so here's take two on
this one.  The things that people want out of the text logs are not
necessarily the same things they want from the SQL ones.  For example, I
have a situation where the semantics of the syslog output is being driven
by Sarbanes-Oxley related mechanics.  But the SQL logs are be based on my
requirements, which is to include close enough to everything that it might
as well be the whole set, in case I forgot something I find I need later.
The SQL logs are *completely* different from the syslog setup.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WIP patch - INSERT-able log statements

From
Bruce Momjian
Date:
What is the status of this patch?

---------------------------------------------------------------------------

FAST PostgreSQL wrote:
> Hi,
>
> I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet
completelyfunctional) 
>
> "Allow server log information to be output as INSERT statements
> This would allow server log information to be easily loaded into a database for analysis. "
>
> I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts.
>
> What is done so far
> ---------------
>
> Two postgresql.conf variables
>
> #log_output_type = 'text'        #Valid values are 'SQL' or 'text'
> #log_output_table_name = 'auditlogs'
>
> These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output
asINSERT commands. 
>
> The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the
userto give one and he can create it if and when he wants to. The alternative is we decide the table name and make
initdbto create one.   
>
> The proposed log output structure
> ------------------
> INSERT INTO user_defined_table values( timestamp_with_milliseconds,  timestamp, username,  databasename, sessionid,
host_and_port,host, proc_id, command_tag,  session_start, transaction_id,  error_severity,  SQL_State_Code,
error_message);
>
> All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these
columnswill have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in
appropriatecolumns. The code has been modified to do  
> this.
>
> Issues/Questions are:
> - How about 'Statement duration log'.  This will come to the logger as a single string and after the query execution.
Inthe existing log we can make sense of the duration log by matching it with the statement above it or by the statement
whichgets printed besides it (Again as  
> a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row.
(Mypreferred option is to add another column to the table structure defined above as 'duration'. But haven't figured
outhow to achieve this, because the  
> statement is printed first and then the duration as another log.)
>
> - If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current
syslogwriter function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other
informationbefore outputting it, which  
> cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log
outputis best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose
andvarious other statistics log.  
>
> - There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE
........'.For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both
textand SQL outputs ? 
>
> Rgds,
> Arul Shaji
>
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the
ordinaryuser of the email address to which it was addressed and may contain copyright and/or legally privileged
information.No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive
thisemail in error, please return to sender. Thank you. 
>
> If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please
emailunsubscribe@fast.fujitsu.com.au 

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +