Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode? - Mailing list pgsql-general

From Steve Rogerson
Subject Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?
Date
Msg-id 10a9b3be-f813-085c-d6a7-285f6ae3f82b@yewtc.demon.co.uk
Whole thread Raw
In response to Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On 21/12/16 05:24, Tom Lane wrote:
> James Zhou <james@360data.ca> writes:
>>       - *But their sorting order seems to be undefined. Can anyone comment
>>       the sorting rules?*
>
> Well, it would depend on lc_collate, which you have not told us, and
> it would also depend on how well your platform's strcoll() function
> implements that collation; but you have not told us what platform this
> is running on.

As I understand it, when you first initialise pg with initdb, it inherits the
collation of the process that runs the initdb.
Having said that see:

https://www.postgresql.org/docs/9.6/static/collation.html

"If the operating system provides support for using multiple locales within a
single program (newlocale and related functions), then when a database cluster
is initialized, initdb populates the system catalog pg_collation with
collations based on all the locales it finds on the operating system at the time."

So the pg is capable, in principle at least,  of using any of the locales
available at the time that initdb is run.

>
> Most of the other behaviors you mention are also partly or wholly
> dependent on which software you use with Postgres and whether you've
> correctly configured that software.  So it's pretty hard to answer
> this usefully with only this much info.
>

The more recent versions of perl (see http://perldoc.perl.org/perlunicode.htm
- maybe other languages) knows, not only about code points, but also
"graphemes", so in the appropriate context "LATIN CAPITAL LETTER E WITH ACUTE"
can be  considered to be "equal" to "LATIN CAPITAL LETTER E"  together with
"COMBINING ACUTE ACCENT", although they are 1 and 2 unicode characters
respectively so this effects notions of equality as well as collation. This
has implications for pg varchar(N) fields etc.

I would be interest to know what support pg has/will have for graphemes.

Steve



pgsql-general by date:

Previous
From: Yogesh Sharma
Date:
Subject: Re: [GENERAL]
Next
From: Vick Khera
Date:
Subject: Re: [GENERAL] Fwd: Request to share approach during REINDEX operation