Thread: TODO list

TODO list

From
Andrew Dunstan
Date:
2 things.

I submitted a patch for this 5 months ago, which is still waiting to be 
merged (hope it hasn't bitrotted in the meantime):

. Allow log lines to include session-level information, like database 
and user

If nobody is working on this I am prepared to look at it:

. Allow logging of only data definition(DDL), or DDL and modification 
statements

cheers

andrew



logging statement levels

From
Andrew Dunstan
Date:
I wrote:

>
>
> If nobody is working on this I am prepared to look at it:
>
> . Allow logging of only data definition(DDL), or DDL and modification 
> statements
>



Here are some options:

1. change the type of "log_statement" option from boolean to string, 
with allowed values of "all, mod, ddl, none" with default "none".
2. same as 1. but make boolean true values synonyms for "all" and 
boolean false values synonyms for "none".
3. keep "log_statement" option as now and add a new option 
"log_statement_level" with the same options as 1. but default to "all", 
which will have no effect unless "log_statement" is true.

Also, I assume "modification statements" means insert/update/delete, or 
are we talking about DDL mods (like "alter table")?

Finally, what about functions that have side effects? It would be nice 
to be able to detect the statements to be logged at the syntactic level, 
but it strikes me that that might not be possible.

cheers

andrew






Re: TODO list

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> 
> 2 things.
> 
> I submitted a patch for this 5 months ago, which is still waiting to be 
> merged (hope it hasn't bitrotted in the meantime):
> 
> . Allow log lines to include session-level information, like database 
> and user
> 
> If nobody is working on this I am prepared to look at it:
> 
> . Allow logging of only data definition(DDL), or DDL and modification 
> statements

Yes, sorry I haven't gotten back to that, and yes the DDL idea is a good
one.

For the log idea, I think we need to get a way to merge all the per-line
info into one setup, so pid, timestamp, user, etc would all be
configurable using your setup.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: TODO list

From
"Andrew Dunstan"
Date:
Bruce Momjian said:
> Andrew Dunstan wrote:
>>
>> 2 things.
>>
>> I submitted a patch for this 5 months ago, which is still waiting to
be  merged (hope it hasn't bitrotted in the meantime):
>>
>> . Allow log lines to include session-level information, like database
and user
>>
>> If nobody is working on this I am prepared to look at it:
>>
>> . Allow logging of only data definition(DDL), or DDL and modification
statements
>
> Yes, sorry I haven't gotten back to that, and yes the DDL idea is a good
one.
>
> For the log idea, I think we need to get a way to merge all the
> per-line info into one setup, so pid, timestamp, user, etc would all be
configurable using your setup.
>

I thought we had thrashed this out back in August. Certainly the only
thing I recall seeing after I submitted the patch was some stylistic
criticism from Neil, which I addressed in a revised patch.

Anyway, it is in principle doable. That's partly why I adopted a printf
style format string. There are some wrinkles, though:
. interaction with syslog pid/timestamp logging
. making sure the info is available when you need to log it - I had to
rearrange a few thing to avoid getting SEGVs, IIRC.

Also, the session duration logging part of the patch is orthogonal to the
issue.

cheers

andrew





Re: TODO list

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Bruce Momjian said:
> > Andrew Dunstan wrote:
> >>
> >> 2 things.
> >>
> >> I submitted a patch for this 5 months ago, which is still waiting to
> be  merged (hope it hasn't bitrotted in the meantime):
> >>
> >> . Allow log lines to include session-level information, like database
> and user
> >>
> >> If nobody is working on this I am prepared to look at it:
> >>
> >> . Allow logging of only data definition(DDL), or DDL and modification
> statements
> >
> > Yes, sorry I haven't gotten back to that, and yes the DDL idea is a good
> one.
> >
> > For the log idea, I think we need to get a way to merge all the
> > per-line info into one setup, so pid, timestamp, user, etc would all be
> configurable using your setup.
> >
> 
> I thought we had thrashed this out back in August. Certainly the only
> thing I recall seeing after I submitted the patch was some stylistic
> criticism from Neil, which I addressed in a revised patch.
> 
> Anyway, it is in principle doable. That's partly why I adopted a printf
> style format string. There are some wrinkles, though:
> . interaction with syslog pid/timestamp logging

Yes.  If you use syslog, just don't ask for pid/timestamp and let syslog
do it.  Of course, right now we are able to send non-pid/timestamp to
syslog _and_ send pid/timestamp to a log file, but that seems like a
rare operation that doesn't justify keeping the various log parameters
separate.

Also, I would like to see some kind of session identifier that is more
unique than pid, which wraps around.  Ideally we could have 10{pid},
then then the pid wraps around, 20{pid), or something like that.

> . making sure the info is available when you need to log it - I had to
> rearrange a few thing to avoid getting SEGVs, IIRC.

Of course some messages, like postmaster status messages, don't have
some of these fields, like username or host.  Is that going to cause
problems for tools that read our log files?

> Also, the session duration logging part of the patch is orthogonal to the
> issue.

You mean query duration?  Yes, it is orthoginal.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: TODO list

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

>Andrew Dunstan wrote:
>  
>
>>I thought we had thrashed this out back in August. Certainly the only
>>thing I recall seeing after I submitted the patch was some stylistic
>>criticism from Neil, which I addressed in a revised patch.
>>
>>Anyway, it is in principle doable. That's partly why I adopted a printf
>>style format string. There are some wrinkles, though:
>>. interaction with syslog pid/timestamp logging
>>    
>>
>
>Yes.  If you use syslog, just don't ask for pid/timestamp and let syslog
>do it.  Of course, right now we are able to send non-pid/timestamp to
>syslog _and_ send pid/timestamp to a log file, but that seems like a
>rare operation that doesn't justify keeping the various log parameters
>separate.
>



I'm OK with that as long as it is the consensus view. It does seem a 
little odd to remove functionality (however rare) for the sake of 
configuration neatness, though.

>
>Also, I would like to see some kind of session identifier that is more
>unique than pid, which wraps around.  Ideally we could have 10{pid},
>then then the pid wraps around, 20{pid), or something like that.
>


This requires some thought. ISTM it wouldn't buy you much unless you 
made it persistent across server restarts, and possibly not even then. I 
don't see it as a reason to hold up these features, though. If someone 
wants to tackle this it could be plugged in to the loginfo feature very 
easily as an extra escape sequence.

>
>  
>
>>. making sure the info is available when you need to log it - I had to
>>rearrange a few thing to avoid getting SEGVs, IIRC.
>>    
>>
>
>Of course some messages, like postmaster status messages, don't have
>some of these fields, like username or host.  Is that going to cause
>problems for tools that read our log files?
>


If users want a non-empty info string they will have to teach the tools 
to handle it anyway. The point was, however, that rolling up PID and 
timestamp into the printf-style format will require some significant 
work, because we wouldn't want to lose that info if the user/db weren't 
known, whereas the patch currently suppresses all log-info output if 
these are not present (i.e. when MyProcPort == NULL).

>
>  
>
>>Also, the session duration logging part of the patch is orthogonal to the
>>issue.
>>    
>>
>
>You mean query duration?  Yes, it is orthoginal.
>  
>


No, I meant the logging of the end of a session, including its duration, 
which was also in the patch.

cheers

andrew





Re: TODO list

From
Jon Jensen
Date:
On Tue, 6 Jan 2004, Andrew Dunstan wrote:

> >Also, I would like to see some kind of session identifier that is more
> >unique than pid, which wraps around.  Ideally we could have 10{pid},
> >then then the pid wraps around, 20{pid), or something like that.
> 
> This requires some thought. ISTM it wouldn't buy you much unless you 
> made it persistent across server restarts, and possibly not even then.

And on OpenBSD (though no other platforms that I know of) the PID is a
random number, so there is no "wrapping" to begin with.

Jon


Re: TODO list

From
Mario Weilguni
Date:
Am Tuesday 06 January 2004 21:30 schrieb Jon Jensen:
> On Tue, 6 Jan 2004, Andrew Dunstan wrote:
> > >Also, I would like to see some kind of session identifier that is more
> > >unique than pid, which wraps around.  Ideally we could have 10{pid},
> > >then then the pid wraps around, 20{pid), or something like that.
> >
> > This requires some thought. ISTM it wouldn't buy you much unless you
> > made it persistent across server restarts, and possibly not even then.
>
> And on OpenBSD (though no other platforms that I know of) the PID is a
> random number, so there is no "wrapping" to begin with.

Linux >= 2.6 has random pids too.


Re: TODO list

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> 
> 
> Bruce Momjian wrote:
> 
> >Andrew Dunstan wrote:
> >  
> >
> >>I thought we had thrashed this out back in August. Certainly the only
> >>thing I recall seeing after I submitted the patch was some stylistic
> >>criticism from Neil, which I addressed in a revised patch.
> >>
> >>Anyway, it is in principle doable. That's partly why I adopted a printf
> >>style format string. There are some wrinkles, though:
> >>. interaction with syslog pid/timestamp logging
> >>    
> >>
> >
> >Yes.  If you use syslog, just don't ask for pid/timestamp and let syslog
> >do it.  Of course, right now we are able to send non-pid/timestamp to
> >syslog _and_ send pid/timestamp to a log file, but that seems like a
> >rare operation that doesn't justify keeping the various log parameters
> >separate.
> >
> 
> 
> 
> I'm OK with that as long as it is the consensus view. It does seem a 
> little odd to remove functionality (however rare) for the sake of 
> configuration neatness, though.

Yes, agreed.  Let's explore it.  The rare functionality we would be
removing is for:o  User uses syslogo  User wants to log postmaster output to a file tooo  User wants timestamp info in
thepostmaster output file
 

And the downside is that they get duplicate timestamps in syslog.  

Now, if we don't merge timestamp into your new per-line log string, we
end up with a rather illogical and inflexible output format, only to
allow the rare case listed above.

Clearly this isn't a 100% clear decision, but it seems to lean in the
direction of removing the functionality with the goal of providing a
more logical logging API to the users.

> >Also, I would like to see some kind of session identifier that is more
> >unique than pid, which wraps around.  Ideally we could have 10{pid},
> >then then the pid wraps around, 20{pid), or something like that.
> >
> 
> 
> This requires some thought. ISTM it wouldn't buy you much unless you 
> made it persistent across server restarts, and possibly not even then. I 
> don't see it as a reason to hold up these features, though. If someone 
> wants to tackle this it could be plugged in to the loginfo feature very 
> easily as an extra escape sequence.

Yes, that was my idea --- let's get this in and we can then add a
session id, and your point about restarts is a good one.

> >>. making sure the info is available when you need to log it - I had to
> >>rearrange a few thing to avoid getting SEGVs, IIRC.
> >>    
> >>
> >
> >Of course some messages, like postmaster status messages, don't have
> >some of these fields, like username or host.  Is that going to cause
> >problems for tools that read our log files?
> >
> 
> 
> If users want a non-empty info string they will have to teach the tools 
> to handle it anyway. The point was, however, that rolling up PID and 
> timestamp into the printf-style format will require some significant 
> work, because we wouldn't want to lose that info if the user/db weren't 
> known, whereas the patch currently suppresses all log-info output if 
> these are not present (i.e. when MyProcPort == NULL).

Oh, good point.  That would suggest that maybe we are better off leaving
pid and timestamp as separate options and _not_ merge them into your new
string.  I am starting to think having your string print only
session-specific information is the best way.

I wonder if we should rename your GUC log_session_line or something like
that.

> >>Also, the session duration logging part of the patch is orthogonal to the
> >>issue.
> >
> >You mean query duration?  Yes, it is orthoginal.
> 
> No, I meant the logging of the end of a session, including its duration, 
> which was also in the patch.

Oh, I missed that one.  It seems like a nice addition.  I see you added
it when they ask for log_connections.  Good idea.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 



Re: TODO list

From
Andrew Dunstan
Date:

Jon Jensen wrote:

>On Tue, 6 Jan 2004, Andrew Dunstan wrote:
>
>  
>
>>>Also, I would like to see some kind of session identifier that is more
>>>unique than pid, which wraps around.  Ideally we could have 10{pid},
>>>then then the pid wraps around, 20{pid), or something like that.
>>>      
>>>
>>This requires some thought. ISTM it wouldn't buy you much unless you 
>>made it persistent across server restarts, and possibly not even then.
>>    
>>
>
>And on OpenBSD (though no other platforms that I know of) the PID is a
>random number, so there is no "wrapping" to begin with.
>
>  
>

OK, so a sessionid based on prefix+pid won't work portably. If we 
*really* want to do it, a cluster-wide sequence generator would probably 
be the way to go, but I suspect that with the ability to log session 
termination explicitly (which I have already provided) much of the 
supposed extra utility would disappear anyway.

cheers

andrew



Re: TODO list

From
Andrew Dunstan
Date:

Bruce Momjian wrote:

>Andrew Dunstan wrote:
>  
>
>>Bruce Momjian wrote:
>>
>>    
>>
>>>Andrew Dunstan wrote:
>>> 
>>>
>>>      
>>>
>>>>I thought we had thrashed this out back in August. Certainly the only
>>>>thing I recall seeing after I submitted the patch was some stylistic
>>>>criticism from Neil, which I addressed in a revised patch.
>>>>
>>>>Anyway, it is in principle doable. That's partly why I adopted a printf
>>>>style format string. There are some wrinkles, though:
>>>>. interaction with syslog pid/timestamp logging
>>>>   
>>>>
>>>>        
>>>>
>>>Yes.  If you use syslog, just don't ask for pid/timestamp and let syslog
>>>do it.  Of course, right now we are able to send non-pid/timestamp to
>>>syslog _and_ send pid/timestamp to a log file, but that seems like a
>>>rare operation that doesn't justify keeping the various log parameters
>>>separate.
>>>
>>>      
>>>
>>
>>I'm OK with that as long as it is the consensus view. It does seem a 
>>little odd to remove functionality (however rare) for the sake of 
>>configuration neatness, though.
>>    
>>
>
>Yes, agreed.  Let's explore it.  The rare functionality we would be
>removing is for:
>    
>    o  User uses syslog
>    o  User wants to log postmaster output to a file too
>    o  User wants timestamp info in the postmaster output file
>
>And the downside is that they get duplicate timestamps in syslog.  
>
>Now, if we don't merge timestamp into your new per-line log string, we
>end up with a rather illogical and inflexible output format, only to
>allow the rare case listed above.
>
>Clearly this isn't a 100% clear decision, but it seems to lean in the
>direction of removing the functionality with the goal of providing a
>more logical logging API to the users.
>
>  
>
>>>Also, I would like to see some kind of session identifier that is more
>>>unique than pid, which wraps around.  Ideally we could have 10{pid},
>>>then then the pid wraps around, 20{pid), or something like that.
>>>
>>>      
>>>
>>This requires some thought. ISTM it wouldn't buy you much unless you 
>>made it persistent across server restarts, and possibly not even then. I 
>>don't see it as a reason to hold up these features, though. If someone 
>>wants to tackle this it could be plugged in to the loginfo feature very 
>>easily as an extra escape sequence.
>>    
>>
>
>Yes, that was my idea --- let's get this in and we can then add a
>session id, and your point about restarts is a good one.
>
>  
>
>>>>. making sure the info is available when you need to log it - I had to
>>>>rearrange a few thing to avoid getting SEGVs, IIRC.
>>>>   
>>>>
>>>>        
>>>>
>>>Of course some messages, like postmaster status messages, don't have
>>>some of these fields, like username or host.  Is that going to cause
>>>problems for tools that read our log files?
>>>
>>>      
>>>
>>If users want a non-empty info string they will have to teach the tools 
>>to handle it anyway. The point was, however, that rolling up PID and 
>>timestamp into the printf-style format will require some significant 
>>work, because we wouldn't want to lose that info if the user/db weren't 
>>known, whereas the patch currently suppresses all log-info output if 
>>these are not present (i.e. when MyProcPort == NULL).
>>    
>>
>
>Oh, good point.  That would suggest that maybe we are better off leaving
>pid and timestamp as separate options and _not_ merge them into your new
>string.  I am starting to think having your string print only
>session-specific information is the best way.
>
>I wonder if we should rename your GUC log_session_line or something like
>that.
>
>  
>
>>>>Also, the session duration logging part of the patch is orthogonal to the
>>>>issue.
>>>>        
>>>>
>>>You mean query duration?  Yes, it is orthoginal.
>>>      
>>>
>>No, I meant the logging of the end of a session, including its duration, 
>>which was also in the patch.
>>    
>>
>
>Oh, I missed that one.  It seems like a nice addition.  I see you added
>it when they ask for log_connections.  Good idea.
>


I think you are looking at the original patch, not the revised patch, 
which is here: http://candle.pha.pa.us/mhonarc/patches2/msg00091.html 
and provides a separate GUC var called "log_session_end" - Neil wanted 
these not to be combined, IIRC.

I am agnostic as to the names of the variables.

cheers

andrew




Re: logging statement levels

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> I wrote:
> 
> >
> >
> > If nobody is working on this I am prepared to look at it:
> >
> > . Allow logging of only data definition(DDL), or DDL and modification 
> > statements
> >
> 
> 
> 
> Here are some options:
> 
> 1. change the type of "log_statement" option from boolean to string, 
> with allowed values of "all, mod, ddl, none" with default "none".
> 2. same as 1. but make boolean true values synonyms for "all" and 
> boolean false values synonyms for "none".
> 3. keep "log_statement" option as now and add a new option 
> "log_statement_level" with the same options as 1. but default to "all", 
> which will have no effect unless "log_statement" is true.

I like 1.

> Also, I assume "modification statements" means insert/update/delete, or 

Yes.

> are we talking about DDL mods (like "alter table")?

Alter is DDL.

> Finally, what about functions that have side effects? It would be nice 
> to be able to detect the statements to be logged at the syntactic level, 
> but it strikes me that that might not be possible.

Not possible.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: logging statement levels

From
Andrew Dunstan
Date:
Bruce Momjian wrote:

>Andrew Dunstan wrote:
>  
>


wow. that was nearly 3 months ago ...


>>I wrote:
>>
>>    
>>
>>>If nobody is working on this I am prepared to look at it:
>>>
>>>. Allow logging of only data definition(DDL), or DDL and modification 
>>>statements
>>>
>>>      
>>>
>>
>>Here are some options:
>>
>>1. change the type of "log_statement" option from boolean to string, 
>>with allowed values of "all, mod, ddl, none" with default "none".
>>2. same as 1. but make boolean true values synonyms for "all" and 
>>boolean false values synonyms for "none".
>>3. keep "log_statement" option as now and add a new option 
>>"log_statement_level" with the same options as 1. but default to "all", 
>>which will have no effect unless "log_statement" is true.
>>    
>>
>
>I like 1.
>
>  
>
>>Also, I assume "modification statements" means insert/update/delete, or 
>>    
>>
>
>Yes.
>
>  
>
>>are we talking about DDL mods (like "alter table")?
>>    
>>
>
>Alter is DDL.
>
>  
>
>>Finally, what about functions that have side effects? It would be nice 
>>to be able to detect the statements to be logged at the syntactic level, 
>>but it strikes me that that might not be possible.
>>    
>>
>
>Not possible.
>
>  
>

Subsequent discussion suggested we should add "syntax-errors" to the 
allowed values (and I would favor making it the default).

The problem is this - in order to make the decision about whether or not 
to log, we need to have parsed the statement (unless the level is set 
to  "all").  My simple approach, which would mean that the entire patch 
would amount to around 100 lines, maybe, plus docco,  would have the (I 
think) trivial side effect of reversing the order in which a logged 
statement and the corresponding parse error log entry appeared. You 
objected to that effect, so I stopped work on it.

Now I can think of a couple of different approaches which would not have 
this effect:
a. embed a time machine in postgres so we can make a decision based on 
information we do not yet have, or
b. find some spot where we can trap the parse error log message before 
it is emitted and then first log the statement. That spot is probably 
somewhere in src/backend/utils/error/elog.c, but I am not quite sure where.

I have rejected as ugly and unmaintainable monkeying with the parser 
itself to produce the desired effect, and I regret that idea a is beyond 
my humble abilities :-)

cheers

andrew



Re: logging statement levels

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Bruce Momjian wrote:
> 
> >Andrew Dunstan wrote:
> >  
> >
> 
> 
> wow. that was nearly 3 months ago ...

Oh, I remember why I kept this email now. I am going to try to code
this.

> Subsequent discussion suggested we should add "syntax-errors" to the 
> allowed values (and I would favor making it the default).

We already have log_min_error_statement.  Seems that is what should be
used if someone wants only syntax errors.

> The problem is this - in order to make the decision about whether or not 
> to log, we need to have parsed the statement (unless the level is set 
> to  "all").  My simple approach, which would mean that the entire patch 
> would amount to around 100 lines, maybe, plus docco,  would have the (I 
> think) trivial side effect of reversing the order in which a logged 
> statement and the corresponding parse error log entry appeared. You 
> objected to that effect, so I stopped work on it.
> 
> Now I can think of a couple of different approaches which would not have 
> this effect:
> a. embed a time machine in postgres so we can make a decision based on 
> information we do not yet have, or
> b. find some spot where we can trap the parse error log message before 
> it is emitted and then first log the statement. That spot is probably 
> somewhere in src/backend/utils/error/elog.c, but I am not quite sure where.
> 
> I have rejected as ugly and unmaintainable monkeying with the parser 
> itself to produce the desired effect, and I regret that idea a is beyond 
> my humble abilities :-)

I will start on this now.  Thanks.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073