Thread: syslogger line-end processing infelicity
I have been looking at the syslogger code in connection with the CSV log output proposal, and I'm quite concerned about the way it translates every \n into \r\n for Windows output. This has several problems, not least of which is that we can by no means assume that every \n has no semantic significance. Consider the following: INSERT INTO mytable (textfield) VALUES ($$abc def$$); Now if the line ending there is in fact \r\n we will output \r\r\n for it, and if it is just \n we will output \r\n, and in neither case will we be logging what is actually inserted. My first thought is that we might need to distinguish between newlines embedded in the query, which shouldn't be touched, from the newline at the end of the log line. My second thought is that we should quite possibly abandon this translation altogether - we know that our COPY code is quite happy with either style of line ending, as long as the file is consistent, and also many Windows programs will quite happily read files with Unix style line endings (e.g. Wordpad, although not Notepad). My third thought is that even so my first thought has some virtue :-). We really need to ensure that we only rotate files when we are at a genuine end of log line - the situation that Greg Smith described of having the rotator chop a line in two between two files seem wholly unacceptable. Thoughts? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I have been looking at the syslogger code in connection with the CSV log > output proposal, and I'm quite concerned about the way it translates > every \n into \r\n for Windows output. > ... > My second thought is that we should quite possibly abandon this > translation altogether +1 on that. The problem of ensuring atomic output remains though (see nearby complaints from George Pavlov and others). It's bad enough in the existing logging context, but if we're hoping to make easily-machine-readable logs it's a "must fix". regards, tom lane
Andrew Dunstan wrote: > > I have been looking at the syslogger code in connection with the CSV log > output proposal, and I'm quite concerned about the way it translates > every \n into \r\n for Windows output. This has several problems, not > least of which is that we can by no means assume that every \n has no > semantic significance. Consider the following: > > INSERT INTO mytable (textfield) VALUES ($$abc > def$$); > > Now if the line ending there is in fact \r\n we will output \r\r\n for > it, and if it is just \n we will output \r\n, and in neither case will > we be logging what is actually inserted. > > My first thought is that we might need to distinguish between newlines > embedded in the query, which shouldn't be touched, from the newline at > the end of the log line. > > My second thought is that we should quite possibly abandon this > translation altogether - we know that our COPY code is quite happy with > either style of line ending, as long as the file is consistent, and also > many Windows programs will quite happily read files with Unix style line > endings (e.g. Wordpad, although not Notepad). Agreed. We shouldn't touch the data. Every editor I know on windows *except* notepad can deal just fine with Unix line endings, and if you're logging your queries your logfile will be too large to work well in notepad anyway :-) //Magnus
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I have been looking at the syslogger code in connection with the CSV log >> output proposal, and I'm quite concerned about the way it translates >> every \n into \r\n for Windows output. >> ... >> My second thought is that we should quite possibly abandon this >> translation altogether >> > > +1 on that. The problem of ensuring atomic output remains though > (see nearby complaints from George Pavlov and others). It's bad enough > in the existing logging context, but if we're hoping to make > easily-machine-readable logs it's a "must fix". > > > Is that the one you suggested trying to fix by calling write() instead of fprintf()? If so, I can't think of any good reason not to do that anyway. (I assume we're not worried about the use of vfprintf(stderr, ...) in write_stderr().) That really means we have two problems: log lines can be interleaved, and rotation can cause a line to be split over two files. Triple ugh. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> +1 on that. The problem of ensuring atomic output remains though >> (see nearby complaints from George Pavlov and others). > Is that the one you suggested trying to fix by calling write() instead > of fprintf()? If so, I can't think of any good reason not to do that > anyway. Probably not, but it doesn't fix the problem for long log lines (more than PIPE_BUF bytes). The other little problem (which is the reason we like the stderr approach in the first place) is that not all the stderr output we want to capture comes from code under our control. This may not be a huge problem in production situations, since the main issue in my experience is being able to capture dynamic-linker messages when shlib loading fails. But it is a stumbling block in the way of any proposals that involve having a more structured protocol for the stuff going down the wire :-( regards, tom lane
Tom Lane wrote: > The other little problem (which is the reason we like the stderr > approach in the first place) is that not all the stderr output we want > to capture comes from code under our control. This may not be a huge > problem in production situations, since the main issue in my experience > is being able to capture dynamic-linker messages when shlib loading fails. > But it is a stumbling block in the way of any proposals that involve > having a more structured protocol for the stuff going down the wire :-( > > > I don't think that need worry us about CSV output - AFAICS it's redirected quite separately from stderr - more like syslog really, so the CSV output *is* all from code under our control. I'm pondering some very simple method of signalling the end of a CSV line, like appending a null byte (which we would of course strip out, so it would never appear on the file), and only allowing a CSV log rotation if we are on a boundary. cheers andrew
Magnus Hagander wrote: >> My second thought is that we should quite possibly abandon this >> translation altogether - we know that our COPY code is quite happy with >> either style of line ending, as long as the file is consistent, and also >> many Windows programs will quite happily read files with Unix style line >> endings (e.g. Wordpad, although not Notepad). >> > > Agreed. We shouldn't touch the data. Every editor I know on windows > *except* notepad can deal just fine with Unix line endings, and if > you're logging your queries your logfile will be too large to work well > in notepad anyway :-) > > > > OK, so do we consider this a bug fix and backpatch it all the way to 8.0? Nobody's complained so far that I know of, and it's only damaged logs, not damaged primary data. I'm inclined just to fix it in HEAD, and release note the change in behaviour. It will matter more when we get machine-readable logs. cheers andrew
Andrew Dunstan wrote: > > > Magnus Hagander wrote: >>> My second thought is that we should quite possibly abandon this >>> translation altogether - we know that our COPY code is quite happy with >>> either style of line ending, as long as the file is consistent, and also >>> many Windows programs will quite happily read files with Unix style line >>> endings (e.g. Wordpad, although not Notepad). >>> >> >> Agreed. We shouldn't touch the data. Every editor I know on windows >> *except* notepad can deal just fine with Unix line endings, and if >> you're logging your queries your logfile will be too large to work well >> in notepad anyway :-) >> >> >> >> > > OK, so do we consider this a bug fix and backpatch it all the way to > 8.0? Nobody's complained so far that I know of, and it's only damaged > logs, not damaged primary data. I'm inclined just to fix it in HEAD, and > release note the change in behaviour. It will matter more when we get > machine-readable logs. Agreed, I don't think we need to backpatch it. And if you do, you only need to go as far as to 8.2, there are other bigger problems earlier than that - the reason we're deprecating 8.0 and 8.1 (on win32 only!) when 8.3 comes out. I guess if it applies with no changes you can backpatch all the way (if at all), but don't put any extra work into it, IMHO. //Magnus
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> +1 on that. The problem of ensuring atomic output remains though >>> (see nearby complaints from George Pavlov and others). >>> > > >> Is that the one you suggested trying to fix by calling write() instead >> of fprintf()? If so, I can't think of any good reason not to do that >> anyway. >> > > Probably not, but it doesn't fix the problem for long log lines (more > than PIPE_BUF bytes). > > The other little problem (which is the reason we like the stderr > approach in the first place) is that not all the stderr output we want > to capture comes from code under our control. This may not be a huge > problem in production situations, since the main issue in my experience > is being able to capture dynamic-linker messages when shlib loading fails. > But it is a stumbling block in the way of any proposals that involve > having a more structured protocol for the stuff going down the wire :-( > > > I have been trying to think of how we can get around the problem of multiplexing our own output inappropriately. I have no great insights, but I did think of these: . use one pipe per backend instead of one per postmaster, and have the syslogger poll them all. . use a mutex to control access to the pipe . same as previous but use a worker thread for each backend to do logging so blocking on the mutex wouldn't block the backend All of these look like a lot of work for a relatively infrequent problem, not to mention plenty of other disadvantages. cheers andrew