Re: Unicode + LC_COLLATE - Mailing list pgsql-general

From Peter Eisentraut
Subject Re: Unicode + LC_COLLATE
Date
Msg-id 200404221539.05444.peter_e@gmx.net
Whole thread Raw
In response to Unicode + LC_COLLATE  ("John Sidney-Woollett" <johnsw@wardbrook.com>)
Responses Re: Unicode + LC_COLLATE
List pgsql-general
Am Donnerstag, 22. April 2004 13:17 schrieb John Sidney-Woollett:
> Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
> be for sorts (and indexes?) when a multibyte unicode character is
> encountered?

You get your strings sorted in binary order of the UTF-8 encoding, which is
probably not very interesting, but it's possible.

> Is it also true that if LC_COLLATE != 'C' that indexes cannot be used for
> LIKE comparisons (and is this also true for en_US.iso885915)?

No, see <http://www.postgresql.org/docs/7.4/static/indexes-opclass.html>.

> Our database is UNICODE with LC_COLLATE=en_US.iso885915. Does anyone know
> what the effect of someone storing a cyrillic/chinese or korean character
> is?

This setup will result in UTF-8 characters being sorted by the system thinking
they are actually ISO-8859-15 characters.  So the result will be random at
best.

> (We are using JDBC with a webapp so all the unicode concerns are
> handled transparently, apparantly). When the data is extracted from the DB
> will it render correctly in the browser provided we send all responses
> encoded in UTF-8?

If your database is in UNICODE and you're using JDBC then you should be all
set as far as PostgreSQL is concerned.  Of course, your HTML pages need to
declare the encoding correctly as well.

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Unicode + LC_COLLATE
Next
From: "Priem, Alexander"
Date:
Subject: Re: Unicode problem ???