Thread: syslogger line-end processing infelicity

syslogger line-end processing infelicity

From
Andrew Dunstan
Date:
I have been looking at the syslogger code in connection with the CSV log 
output proposal, and I'm quite concerned about the way it translates 
every \n into \r\n for Windows output. This has several problems, not 
least of which is that we can by no means assume that every \n has no 
semantic significance. Consider the following:

INSERT INTO mytable (textfield) VALUES ($$abc
def$$);

Now if the line ending there is in fact \r\n we will output \r\r\n for 
it, and if it is just \n we will output \r\n, and in neither case will 
we be logging what is actually inserted.

My first thought is that we might need to distinguish between newlines 
embedded in the query, which shouldn't be touched, from the newline at 
the end of the log line.

My second thought is that we should quite possibly abandon this 
translation altogether - we know that our COPY code is quite happy with 
either style of line ending, as long as the file is consistent, and also 
many Windows programs will quite happily read files with Unix style line 
endings (e.g. Wordpad, although not Notepad).

My third thought is that even so my first thought has some virtue :-). 
We really need to ensure that we only rotate files when we are at a 
genuine end of log line - the situation that Greg Smith described of 
having the rotator chop a line in two between two files seem wholly 
unacceptable.

Thoughts?


cheers

andrew



Re: syslogger line-end processing infelicity

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I have been looking at the syslogger code in connection with the CSV log 
> output proposal, and I'm quite concerned about the way it translates 
> every \n into \r\n for Windows output.
> ...
> My second thought is that we should quite possibly abandon this 
> translation altogether

+1 on that.  The problem of ensuring atomic output remains though
(see nearby complaints from George Pavlov and others).  It's bad enough
in the existing logging context, but if we're hoping to make
easily-machine-readable logs it's a "must fix".
        regards, tom lane


Re: syslogger line-end processing infelicity

From
Magnus Hagander
Date:
Andrew Dunstan wrote:
> 
> I have been looking at the syslogger code in connection with the CSV log
> output proposal, and I'm quite concerned about the way it translates
> every \n into \r\n for Windows output. This has several problems, not
> least of which is that we can by no means assume that every \n has no
> semantic significance. Consider the following:
> 
> INSERT INTO mytable (textfield) VALUES ($$abc
> def$$);
> 
> Now if the line ending there is in fact \r\n we will output \r\r\n for
> it, and if it is just \n we will output \r\n, and in neither case will
> we be logging what is actually inserted.
> 
> My first thought is that we might need to distinguish between newlines
> embedded in the query, which shouldn't be touched, from the newline at
> the end of the log line.
> 
> My second thought is that we should quite possibly abandon this
> translation altogether - we know that our COPY code is quite happy with
> either style of line ending, as long as the file is consistent, and also
> many Windows programs will quite happily read files with Unix style line
> endings (e.g. Wordpad, although not Notepad).

Agreed. We shouldn't touch the data. Every editor I know on windows
*except* notepad can deal just fine with Unix line endings, and if
you're logging your queries your logfile will be too large to work well
in notepad anyway :-)


//Magnus


Re: syslogger line-end processing infelicity

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I have been looking at the syslogger code in connection with the CSV log 
>> output proposal, and I'm quite concerned about the way it translates 
>> every \n into \r\n for Windows output.
>> ...
>> My second thought is that we should quite possibly abandon this 
>> translation altogether
>>     
>
> +1 on that.  The problem of ensuring atomic output remains though
> (see nearby complaints from George Pavlov and others).  It's bad enough
> in the existing logging context, but if we're hoping to make
> easily-machine-readable logs it's a "must fix".
>
>         
>   

Is that the one you suggested trying to fix by calling write() instead 
of fprintf()? If so, I can't think of any good reason not to do that 
anyway. (I assume we're not worried about the use of vfprintf(stderr, 
...) in write_stderr().)

That really means we have two problems: log lines can be interleaved, 
and rotation can cause a line to be split over two files. Triple ugh.

cheers

andrew





Re: syslogger line-end processing infelicity

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> +1 on that.  The problem of ensuring atomic output remains though
>> (see nearby complaints from George Pavlov and others).

> Is that the one you suggested trying to fix by calling write() instead 
> of fprintf()? If so, I can't think of any good reason not to do that 
> anyway.

Probably not, but it doesn't fix the problem for long log lines (more
than PIPE_BUF bytes).

The other little problem (which is the reason we like the stderr
approach in the first place) is that not all the stderr output we want
to capture comes from code under our control.  This may not be a huge
problem in production situations, since the main issue in my experience
is being able to capture dynamic-linker messages when shlib loading fails.
But it is a stumbling block in the way of any proposals that involve
having a more structured protocol for the stuff going down the wire :-(
        regards, tom lane


Re: syslogger line-end processing infelicity

From
Andrew Dunstan
Date:

Tom Lane wrote:
> The other little problem (which is the reason we like the stderr
> approach in the first place) is that not all the stderr output we want
> to capture comes from code under our control.  This may not be a huge
> problem in production situations, since the main issue in my experience
> is being able to capture dynamic-linker messages when shlib loading fails.
> But it is a stumbling block in the way of any proposals that involve
> having a more structured protocol for the stuff going down the wire :-(
>
>
>   

I don't think that need worry us about CSV output - AFAICS it's 
redirected quite separately from stderr - more like syslog really, so 
the CSV output *is* all from code under our control.

I'm pondering some very simple method of signalling the end of a CSV 
line, like appending a null byte (which we would of course strip out, so 
it would never appear on the file), and only allowing a CSV log rotation 
if we are on a boundary.

cheers

andrew


Re: syslogger line-end processing infelicity

From
Andrew Dunstan
Date:

Magnus Hagander wrote:
>> My second thought is that we should quite possibly abandon this
>> translation altogether - we know that our COPY code is quite happy with
>> either style of line ending, as long as the file is consistent, and also
>> many Windows programs will quite happily read files with Unix style line
>> endings (e.g. Wordpad, although not Notepad).
>>     
>
> Agreed. We shouldn't touch the data. Every editor I know on windows
> *except* notepad can deal just fine with Unix line endings, and if
> you're logging your queries your logfile will be too large to work well
> in notepad anyway :-)
>
>
>
>   

OK, so do we consider this a bug fix and backpatch it all the way to 
8.0? Nobody's complained so far that I know of, and it's only damaged 
logs, not damaged primary data. I'm inclined just to fix it in HEAD, and 
release note the change in behaviour. It will matter more when we get 
machine-readable logs.

cheers

andrew


Re: syslogger line-end processing infelicity

From
Magnus Hagander
Date:
Andrew Dunstan wrote:
> 
> 
> Magnus Hagander wrote:
>>> My second thought is that we should quite possibly abandon this
>>> translation altogether - we know that our COPY code is quite happy with
>>> either style of line ending, as long as the file is consistent, and also
>>> many Windows programs will quite happily read files with Unix style line
>>> endings (e.g. Wordpad, although not Notepad).
>>>     
>>
>> Agreed. We shouldn't touch the data. Every editor I know on windows
>> *except* notepad can deal just fine with Unix line endings, and if
>> you're logging your queries your logfile will be too large to work well
>> in notepad anyway :-)
>>
>>
>>
>>   
> 
> OK, so do we consider this a bug fix and backpatch it all the way to
> 8.0? Nobody's complained so far that I know of, and it's only damaged
> logs, not damaged primary data. I'm inclined just to fix it in HEAD, and
> release note the change in behaviour. It will matter more when we get
> machine-readable logs.

Agreed, I don't think we need to backpatch it. And if you do, you only
need to go as far as to 8.2, there are other bigger problems earlier
than that - the reason we're deprecating 8.0 and 8.1 (on win32 only!)
when 8.3 comes out. I guess if it applies with no changes you can
backpatch all the way (if at all), but don't put any extra work into it,
IMHO.

//Magnus


Re: syslogger line-end processing infelicity

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> +1 on that.  The problem of ensuring atomic output remains though
>>> (see nearby complaints from George Pavlov and others).
>>>       
>
>   
>> Is that the one you suggested trying to fix by calling write() instead 
>> of fprintf()? If so, I can't think of any good reason not to do that 
>> anyway.
>>     
>
> Probably not, but it doesn't fix the problem for long log lines (more
> than PIPE_BUF bytes).
>
> The other little problem (which is the reason we like the stderr
> approach in the first place) is that not all the stderr output we want
> to capture comes from code under our control.  This may not be a huge
> problem in production situations, since the main issue in my experience
> is being able to capture dynamic-linker messages when shlib loading fails.
> But it is a stumbling block in the way of any proposals that involve
> having a more structured protocol for the stuff going down the wire :-(
>
>
>   


I have been trying to think of how we can get around the problem of 
multiplexing our own output inappropriately. I have no great insights, 
but I did think of these:

. use one pipe per backend instead of one per postmaster, and have the 
syslogger poll them all.
. use a mutex to control access to the pipe
. same as previous but use a worker thread for each backend to do 
logging so blocking on the mutex wouldn't block the backend

All of these look like a lot of work for a relatively infrequent 
problem, not to mention plenty of other disadvantages.

cheers

andrew