Re: different sort order in windows and linux version - Mailing list pgsql-general

From Agent M
Subject Re: different sort order in windows and linux version
Date
Msg-id 910202e1b6b4f76ae4f871e165d4e01f@themactionfaction.com
Whole thread Raw
In response to Re: different sort order in windows and linux version  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: different sort order in windows and linux version  ("Tomi NA" <hefest@gmail.com>)
Re: different sort order in windows and linux version  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
On Jul 2, 2006, at 6:13 AM, Martijn van Oosterhout wrote:
> But I don't think anyone is actually considering importing ICU into the
> postgres source tree, are they?
Why not?

> Size - I'm not sure this is relevent since I don't think we want to
> incorporate it into postgres itself, just let people use it if they
> have it. In any case though, the default dataset is 8MB. This includes
> support for every locale and charset it knows about.
>
> If you drop the conversion stuff (because postgres already has that)
> you're down to about 4MB.
Why would you drop the ICU transcoding support instead of the existing
postgres functions? Why the duplicated effort?


>> Well, the Japanese think that UTF8 is not the solution to all their
>> worries, so they won't be happy with a UTF8-only solution.  Likewise,
>> those of us who only need single-byte character sets won't be very
>> happy
>> with being forced to accept multi-byte processing overhead.
>
> I've not quite understood the japenese problem with Unicode. My
> understanding is that it was primarily due to widespread use of broken
> converters.

Certain Japanese characters cannot make a reliable round-trip through
Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be
happy with an ICU-only solution. However, it would still be of great
benefit to allow ICU to handle as much as possible, leaving the string
encodings to the encoding experts.

At the very least, it would be great to have ICU to handle encoding on
a per-column basis (perhaps extending the text datatype with encoding
info). Perhaps this would be a decent stopgap solution? The backend
protocol would also need a version bump- currently, it converts all
strings to a single encoding.

¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
AgentM
agentm@themactionfaction.com
¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql user change to postgres
Next
From: Victor Escobar
Date:
Subject: Default directory for postgres user?