Re: PostgreSQL, UTF-8 and Mac OS X - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: PostgreSQL, UTF-8 and Mac OS X
Date
Msg-id 20051107144022.GD841@svana.org
Whole thread Raw
In response to Re: PostgreSQL, UTF-8 and Mac OS X  (Guido Neitzer <guido.neitzer@pharmaline.de>)
List pgsql-general
On Mon, Nov 07, 2005 at 02:28:05PM +0100, Guido Neitzer wrote:
> I think I was the one who asked.
>
> I worked on my locale problem on the weekend and was able to build a
> LC_COLLATE file, that actually works with ISO locales, but not with
> UTF-8 (50% progress ... ;-)).

Guess the problem is that you have to import the entire Unicode
database to make it work. I think the code is multibyte aware though,
it's just that no-one has done the work.

Disclaimer: I'm working with Linux/Glibc which has had proper collation
for quite a while now so I have no real understanding of systems that
don't have it.

> When you test the UNIX utility "sort" on Mac OS X, you should be
> aware, that the pre-installed version on Mac OS X ignores locales at
> all ... :-( I had to install the gnu coreutils to get a sort that
> works with locales, and this also fails on UTF-8 but works with ISO
> encoding/collate - same as PG does.

Nasty.

> Now I'm not sure, whether my own LC_COLLATE file is not appropriate
> for UTF-8 (why not?) or whether Mac OS X locale does not support
> UTF-8 at all as you state.

Hmm, I just went back to the source code (adv_cmds-79.1) and it looks
like collations don't support UTF-8 at all. Or any multibyte encoding.

> Will be cool to have locale support directly in PostgreSQL.

Yeah, but seems a bit lame for an operating system to claim to support
multibyte locales if it can't do collation on them. :( It supports
everything but collation, so it's obviously not a priority.

> So, just a quick question regarding a switch: is there a problem with
> using ISO8859-15 for now, and do a switch later with dumping the data
> and import it to a newer version which should then use UTF-8? Do I
> need to do some conversion or how does this work?

If you import as ISO8859-15 now, when you do the upgrade, simply set
the client encoding to that and PostgreSQL will convert it all to UTF-8
during the load.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-general by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: Aggregates, group, and order by
Next
From: Tom Lane
Date:
Subject: Re: PostgreSQL, UTF-8 and Mac OS X