Re: Collate order on Mac OS X, text with diacritics in UTF-8 - Mailing list pgsql-general

From Martin Flahault
Subject Re: Collate order on Mac OS X, text with diacritics in UTF-8
Date
Msg-id 2BAC69E9-7738-4F03-A149-83DC9F80729C@billjobs.com
Whole thread Raw
In response to Re: Collate order on Mac OS X, text with diacritics in UTF-8  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: Collate order on Mac OS X, text with diacritics in UTF-8  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Collate order on Mac OS X, text with diacritics in UTF-8  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-general

Here is an exemple :

postgres=# create database newbase;
CREATE DATABASE
postgres=# \c newbase;
psql (8.4.2)
You are now connected to database "newbase".
newbase=# create table t1 (contenu text);
CREATE TABLE
newbase=# insert into t1 values ('a'), ('e'), ('à'), ('é'), ('A'), ('E');
INSERT 0 6

newbase=# select * from t1 order by contenu;
 contenu 
---------
 A
 E
 a
 e
 à
 é
(6 rows)

newbase=# select * from t1 order by upper(contenu);
 contenu 
---------
 a
 A
 e
 E
 à
 é
(6 rows)


Here is the encoding informations :

newbase=# \encoding
UTF8
newbase=# show lc_collate;
 lc_collate 
------------
 fr_FR
(1 row)

newbase=# show lc_ctype;
 lc_ctype 
----------
 fr_FR
(1 row)


As with others DBMS (MySQL for example), diacritics should be ignored when determining the sort order. Here is the expected output:
 a
 à
 A
 e
 é 
 E


It seems there is a problem with the collating order on BSD systems with diacritics using UTF8.
If you put this text :
a
A
à
é
e
E

in a UTF8 text file and use the "sort" command on it, you will have the same wrong output as with PostgreSQL :
A
E
a
e
à
é

Hope this will help,

Martin

pgsql-general by date:

Previous
From: Daniel Schuchardt
Date:
Subject: Re: postgresql 8.1 windows 2008 64 bit
Next
From: Greg Smith
Date:
Subject: Re: postgresql 8.1 windows 2008 64 bit