Re: Mixing different LC_COLLATE and database encodings - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Mixing different LC_COLLATE and database encodings
Date
Msg-id 20060221064407.GA24481@svana.org
Whole thread Raw
In response to Re: Mixing different LC_COLLATE and database encodings  (Tatsuo Ishii <ishii@sraoss.co.jp>)
List pgsql-general
On Tue, Feb 21, 2006 at 10:27:15AM +0900, Tatsuo Ishii wrote:
> If you consider to allow only UTF-16 or whatever encoding in backend,
> I will strongly against the idea. We Japanese need those encodings
> native support. Converting those encodings with Unicode everytime when
> backend and forntend have conversations will be serious performance
> hit. Moreover the converion is known as not being roundtrip safe, that
> means some information will be lost during the conversion. The another
> point would be on disk format. UTF-16 will require more storage than
> local encodings. Probably UTF-8 will require more.

I didn't say that we only support utf-16 in the backend, I said that
when doing comparisons in a non-C locale, you have to convert to UTF-16
to use ICU. If you don't want to use it, don't, it's not going to be
required at any point. Just like currently with Win32, if you use UTF-8
it has to be converted to UTF-16 prior to string comparison.

The only time any of this is required is *sorting* and if you have an
index defined it acts as a cache for the sorted values. Ofcourse
there's a tradeoff but unless you're sorting large datasets all day I
doubt it'll be noticable.

If you're not sorting, none of this is relevent to you.

> I have a feeling that ICU is good for applications, but is not for
> DBMSs.

I think providing a system where users are able to select out of a
large range of possible collation orders and if necessary specify their
own is a worthy goal. Look at the complaints we get now and then of
people who choose en_US as their locale and are surprised when it gives
them a dictionary sort.

ICU allows users to take an existing collation and tweak it if it
doesn't quite match their expectations. You think this is not useful
for a DBMS?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-general by date:

Previous
From: "R, Rajesh (STSD)"
Date:
Subject: [PATCH] ipv6 support for getaddrinfo.c
Next
From: "uuZZuuZZ (sent by Nabble.com)"
Date:
Subject: win cmd line query tool