Home > mailing lists

Re: [BUGS] main log encoding problem - Mailing list pgsql-general

From	Alexander Law
Subject	Re: [BUGS] main log encoding problem
Date	July 19, 2012 03:38:12
Msg-id	5007AB3D.3010501@gmail.com Whole thread Raw
In response to	Re: [BUGS] main log encoding problem (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: [BUGS] main log encoding problem
List	pgsql-general

Tree view

Hello,

C. We have one logfile with UTF-8.
Pros: Log messages of all our clients can fit in it. We can use any
generic editor/viewer to open it.
Nothing changes for Linux (and other OSes with UTF-8 encoding).
Cons: All the strings written to log file should go through some
conversation function.

I think that the last solution is the solution. What is your opinion?

I am thinking about variant of C.

Problem with C is, converting from other encoding to UTF-8 is not
cheap because it requires huge conversion tables. This may be a
serious problem with busy server. Also it is possible some information
is lossed while in this conversion. This is because there's no
gualntee that there is one-to-one-mapping between UTF-8 and other
encodings. Other problem with UTF-8 is, you have to choose *one*
locale when using your editor. This may or may not affect handling of
string in your editor.

My idea is using mule-internal encoding for the log file instead of
UTF-8. There are several advantages:

1) Converion to mule-internal encoding is cheap because no conversion  table is required. Also no information loss happens in this  conversion.

2) Mule-internal encoding can be handled by emacs, one of the most  popular editors in the world.

3) No need to worry about locale. Mule-internal encoding has enough  information about language.
--

I believe that postgres has such conversion functions anyway. And they used for data conversion when we have clients (and databases) with different encodings. So if they can be used for data, why not to use them for relatively little amount of log messages?
And regarding mule internal encoding - reading about Mule http://www.emacswiki.org/emacs/UnicodeEncoding I found:
In future (probably Emacs 22), Mule will use an internal encoding which is a UTF-8 encoding of a superset of Unicode.
So I still see UTF-8 as a common denominator for all the encodings.
I am not aware of any characters absent in Unicode. Can you please provide some examples of these that can results in lossy conversion?
Сhoosing UTF-8 in a viewer/editor is no big deal too. Most of them detect UTF-8 automagically, and for the others BOM can be added.

Best regards,
Aexander

pgsql-general by date:

From: Craig Ringer
Date: 19 July 2012, 03:34:40
Subject: Re: Segmentation fault

From: Alban Hertroys
Date: 19 July 2012, 03:53:49
Subject: Re: Trouble with NEW

Re: [BUGS] main log encoding problem - Mailing list pgsql-general

Previous

Next