Re: Unicode + LC_COLLATE - Mailing list pgsql-general

From John Sidney-Woollett
Subject Re: Unicode + LC_COLLATE
Date
Msg-id 3532.192.168.0.64.1082641480.squirrel@mercury.wardbrook.com
Whole thread Raw
In response to Re: Unicode + LC_COLLATE  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-general
Peter Eisentraut said:
> Am Donnerstag, 22. April 2004 13:17 schrieb John Sidney-Woollett:
> You get your strings sorted in binary order of the UTF-8 encoding, which
> is probably not very interesting, but it's possible.

Agreed.

>> Is it also true that if LC_COLLATE != 'C' that indexes cannot be used
>> for LIKE comparisons (and is this also true for en_US.iso885915)?

> No, see <http://www.postgresql.org/docs/7.4/static/indexes-opclass.html>.

I wish I understood what this page actually was trying to say.

Is it saying that varchar_pattern_ops sorts according to the 'C' locale
regardless of LC_COLLATE, and that varchar_ops sorts according to the
current value of LC_COLLATE?

> This setup will result in UTF-8 characters being sorted by the system
> thinking
> they are actually ISO-8859-15 characters.  So the result will be random at
> best.

Actually the LC_COLLATE is currently 'C' not as I reported ISO-8859-1.

What would be a correct LC_COLLATE value for my database if we want to
primarily service ISO-8859-1, but allow for
cyrillic/chinese/japanese/korean characters too and have them sorting and
indexing correctly? We are building a multilanguage website...

ls /usr/share/locale produces:
ca  de  en@boldquot  en_SE  fi  hr  ko            no     sk  zh_TW
cs  el  en_GB        en_US  fr  it  locale.alias  pl     sv
da  en  en@quot      es     gl  ja  nl            pt_BR  tr

Thanks for anymore info.

John Sidney-Woollett


pgsql-general by date:

Previous
From: Guy Fraser
Date:
Subject: Re: 7.3.4 on Linux: UPDATE .. foo=foo+1 degrades massivly
Next
From: Tom Lane
Date:
Subject: Re: Missing OID rant