Thread: UNICODE string collating, case insensitive matching

UNICODE string collating, case insensitive matching

From

"Cestmir Hybl Jr."

Date:

04 March 2003, 16:02:59

Hello,

(1) I have a question about multibyte support in PostgreSQL:

Why does collating, character case operations (Upper, Lower, ILIKE) in Postgres use libc locales instead of UNICODE specification when using UTF-8 database encoding. This is useless in real multilingual environment, when strings in multiple languages are stored in the same database. Those strings are NOT treatable by single locale.

There are several UNICODE technical standards, relevant to this:

http://www.unicode.org/reports/tr10/ - Unicode Collation Algorithm

http://www.unicode.org/reports/tr21/ - Case Mappings

(2) Is there someone, who has pgsql database cluster with UTF-8 encoding, *.UTF-8 locale and Upper, Lower, ILIKE functions working properly?

I have compiled sk_SK.UTF-8 locale and string collating works fine (/select ... order by some_field/ query returns properly collated dataset), but (/select Upper(some_field), Lower(some_field)/, and /select ... where some_field ILIKE '%...some non-ASCII text...%'/ does not work.

All of this works fine in sk_SK.ISO-8859-2 locale.

Cestmir Hybl