Re: Mixing different LC_COLLATE and database encodings - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: Mixing different LC_COLLATE and database encodings
Date
Msg-id 20060221.102715.28783242.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Mixing different LC_COLLATE and database encodings  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Mixing different LC_COLLATE and database encodings  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
> On Sat, Feb 18, 2006 at 08:16:07PM -0800, Bill Moseley wrote:
> > Is the Holy Grail encoding and lc_collate settings per column?
>
> Well yes. I've been trying to create a system where you can handle
> multiple collations in the same database. I posted the details to
> -hackers and got part of the way, but it's a lot of work.
>
> As for encodings, to be honest, I'm not sure whether it's a great idea
> to support multiple encodings simultaneously. Things become a lot
> easier if you know everything is the same encoding. If you set the
> client_encoding automatically on startup it has pretty much the same
> effect as having the server always use that encoding. It's just a bit
> of time wasted in conversion, but the client doesn't need to care.
>
> By way of example, see ICU which is an internationalisation library
> we're considering to get consistant locale support over all platforms.
> It supports one encoding, namely UTF-16. It has various functions to
> convert other encodings to or from that, but internally it's all
> UTF-16. So if we do use that, then all encodings (except native UTF-16)
> will need to conversion all the time, so you don't buy anything by
> having the server in some random encoding.
>
> The problem ofcourse being that the SQL standard requires some encoding
> support. No-one has really come up with a proposal for that yet. IMHO,
> that's a parser issue more than anything else.

If you consider to allow only UTF-16 or whatever encoding in backend,
I will strongly against the idea. We Japanese need those encodings
native support. Converting those encodings with Unicode everytime when
backend and forntend have conversations will be serious performance
hit. Moreover the converion is known as not being roundtrip safe, that
means some information will be lost during the conversion. The another
point would be on disk format. UTF-16 will require more storage than
local encodings. Probably UTF-8 will require more.

I have a feeling that ICU is good for applications, but is not for
DBMSs.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Question about COPY to/from
Next
From: Michael Glaesemann
Date:
Subject: Re: How to specify infinity for intervals ?