Re: BUG #5661: The character encoding in logfile is confusing. - Mailing list pgsql-hackers
From | tkbysh2000@yahoo.co.jp |
---|---|
Subject | Re: BUG #5661: The character encoding in logfile is confusing. |
Date | |
Msg-id | 20100922212552.93B2.A495B709@yahoo.co.jp Whole thread Raw |
In response to | Re: BUG #5661: The character encoding in logfile is confusing. (Craig Ringer <craig@postnewspapers.com.au>) |
List | pgsql-hackers |
Hi Craig, Almost Japanese software emit log files by encoding of the server the software running on. I'm not sure it is the best way or not, but Japanese users taking it for granted. So I feel that Japanese users would hope that postgre server has same style with other software, cause many administrators in Japan are familiar and experienced for the way. On Unix, user can specify default character encoding at installing. Software can get it to refer the environment value $LANG e.g. > % echo $LANG > ja_JP.eucJP On Japanese Windows, default encoding is MS-932(or cp-932 or Windows-31J). This is fixed. MS-932 is almost same as Shift-JIS but very few characters has different character code between MS-932 and Shit-JIS. And Shift-JIS doesn't have some characters in MS-932. This is very important issue. This issue has been making a lot of related bugs e.g. below: http://bugs.mysql.com/bug.php?id=7607 And if postgre could be configured to emit a log file by row English messages, some users will choice it if the translating messages to Japanese has some costs. Some administrators in Japan don't hate reading English messages. (Many software are not user friendly for not English users. Many Japanese users are wondering and impressed with postgre emits Japanese messages in log file.) Thank you. =Mikio -- <tkbysh2000@yahoo.co.jp> On Wed, 22 Sep 2010 19:25:47 +0800 Craig Ringer <craig@postnewspapers.com.au> wrote: > On 22/09/2010 5:45 PM, Peter Eisentraut wrote: > > On ons, 2010-09-22 at 16:25 +0800, Craig Ringer wrote: > >> A single log file should obviously be in a single encoding, it's the > >> only sane way to do things. But which encoding is it in? And which > >> *should* it be in? > > > > We need to produce the log output in the server encoding, because that's > > how we need to send it to the client. > > That doesn't mean it can't be recoded for writing to the log file, > though. Perhaps it needs to be. It should be reasonably practical to > detect when the database and log encoding are the same and avoid the > transcoding performance penalty, not that it's big anyway. > > > If you have different databases > > with different server encodings, you will get inconsistently encoded > > output in the log file. > > I don't think that's an OK answer, myself. Mixed encodings with no > delineation in one file = bug as far as I'm concerned. You can't even > rely on being able to search the log anymore. You'll only get away with > it when using languages that mostly stick to the 7-bit ASCII subset, so > most text is still readable; with most other languages you'll get logs > full of what looks to the user like garbage. > > > Conceivably, we could create a configuration option that specifies the > > encoding for the log file, and strings a recoded from whatever gettext() > > produces to the specified encoding. initdb could initialize that option > > suitably, so in most cases users won't have to do anything. > > Yep, I tend to think that'd be the right way to go. It'd still be a bit > of a pain, though, as messages written to stdout/stderr by the > postmaster should be in the system encoding, but messages written to the > log files should be in the encoding specified for logs, unless logging > is being done to syslog, in which case it has to be in the system > encoding after all... > > And, of course, the postmaster still doesn't know how to log anything it > might emit before reading postgresql.conf, because it doesn't know what > encoding to use. > > I still wonder if, rather than making this configurable, the right > choice is to force logging to UTF-8 (with BOM) across the board, right > from postmaster startup. It's consistent, all messages in all other > encodings can be converted to UTF-8 for logging, it's platform > independent, and text editors etc tend to understand and recognise UTF-8 > especially with the BOM. > > Unfortunately, because many unix utilities (grep etc) aren't encoding > aware, that'll cause problems when people go to search log files. For > (eg) "広告掲載" The log files will contain the utf-8 bytes: > > \xe5\xba\x83\xe5\x91\x8a\xe6\x8e\xb2\xe8\xbc\x89 > > but grep on a shift-jis system will be looking for: > > \x8d\x4c\x8d\x90\x8cf\x8d\xda > > so it won't match. > > > Ugh. If only we could say "PostgreSQL requires a system locale with a > UTF-8 encoding". Alas, I don't think that'd go down very well with > packagers or installers. [Insert rant about how stupid it is that *nix > systems still aren't all UTF-8 here]. > > -- > Craig Ringer > > Tech-related writing at http://soapyfrogs.blogspot.com/
pgsql-hackers by date: