Thread: How to setup a good collation?

How to setup a good collation?

From
Nagy László Zsolt
Date:
Example database on a Unix system:

CREATE DATABASE test
       TEMPLATE = template0
       ENCODING = 'UTF8'
       LC_COLLATE = 'hu_HU.UTF-8'
       LC_CTYPE = 'hu_HU.UTF-8'
       CONNECTION LIMIT = -1;
\c test
create table a (name text collate "hu_HU.UTF-8");
insert into a values ('a');
insert into a values ('á');
insert into a values ('Á');
insert into a values ('b');
insert into a values ('e');
insert into a values ('é');
insert into a values ('É');
insert into a values ('o');
insert into a values ('ó');
insert into a values ('ö');
insert into a values ('U');
insert into a values ('Ü');
insert into a values ('Z');


select name from a order by upper(name) asc;


Results in:

"a"
"b"
"e"
"o"
"U"
"Z"
"á"
"Á"
"é"
"É"
"ó"
"ö"
"Ü"

Which is bad. The good order should be the following:

"a"
"á"
"Á"
"b"
"e"
"é"
"É"
"o"
"ó"
"ö"
"U"
"Ü"
"Z"

What is wrong with my collaction setup?

Thanks,

   Laszlo



Re: How to setup a good collation?

From
Nagy László Zsolt
Date:
To answer my own question: it seems that the problem is not with
postgresql but with the FreeBSD OS. FreeBSD does not support sorting by
unicode characters, so I need to recompile the postgresql server with
"use ICU for unicode collation" flag.

2016.03.06. 10:13 keltezéssel, Nagy László Zsolt írta:
> Example database on a Unix system:
>
> CREATE DATABASE test
>        TEMPLATE = template0
>        ENCODING = 'UTF8'
>        LC_COLLATE = 'hu_HU.UTF-8'
>        LC_CTYPE = 'hu_HU.UTF-8'
>        CONNECTION LIMIT = -1;



Re: How to setup a good collation?

From
Nagy László Zsolt
Date:
All right, I have reinstalled postgresql server with "Use ICU for
unicode collation" but it did not help!

What else do I need to make this work?

   Thanks



Re: How to setup a good collation?

From
"Reko Turja"
Date:
> All right, I have reinstalled postgresql server with "Use ICU for
> unicode collation" but it did not help!

Postgres 9.3 on FreeBSD 10.2 with kernel, userland and software from
ports compiled with clang. Server compile flags include ICU in
addition of the defaults and client flags using non-GPL libedit
instead of readline.

test=# select name from a order by upper(name) asc;
name
------
a
á
Á
b
e
é
É
o
ó
ö
U
Ü
Z
(13 rows)
test=#

-Reko



Re: How to setup a good collation?

From
Nagy László Zsolt
Date:

>> All right, I have reinstalled postgresql server with "Use ICU for
>> unicode collation" but it did not help!
>
> Postgres 9.3 on FreeBSD 10.2 with kernel, userland and software from
> ports compiled with clang. Server compile flags include ICU in
> addition of the defaults and client flags using non-GPL libedit
> instead of readline.
Thank you for testing this! I have 9.4 but probably that doesn't matter.
I'm not sure what is causing the problem on my box. I'll be installing a
new virtual machine and  try other variations/settings. Will be back
with the results soon.


Re: How to setup a good collation? [SOLVED]

From
Nagy László Zsolt
Date:
>>> All right, I have reinstalled postgresql server with "Use ICU for
>>> unicode collation" but it did not help!
>> Postgres 9.3 on FreeBSD 10.2 with kernel, userland and software from
>> ports compiled with clang. Server compile flags include ICU in
>> addition of the defaults and client flags using non-GPL libedit
>> instead of readline.
> Thank you for testing this! I have 9.4 but probably that doesn't matter.
> I'm not sure what is causing the problem on my box. I'll be installing a
> new virtual machine and  try other variations/settings. Will be back
> with the results soon.
I have tried to do this the cleanest way possible, and not it is
working. Thank you for your help!

I'm going to provide detailed steps below to help others with similar
problems.

Steps taken:

#1. Installed a new virtual machine with the latest 10.2 RELEASE amd64
#2. Installed all available system updates via freebsd-update
#3. Downloaded the latest ports tree with "portsnap fetch; portsnap
extract".
#4. Configured PostgreSQL 9.4.6 server port with the following options:

* ICU - Use ICU for unicode collation
* INTDATE - builds with 64-bit date/time type
* NLS - Use internationalized messages
* SSL - build with openssl support
* TZDATA
* XML

There are the defaults, except for ICU. I have used 9.4 instead of 9.5
because that is the default postgresql server in FreeBSD 10.2

#5. I have compiled and installed the server from the ports tree. This
was a fresh BSD install, so all of the port dependencies were also
installed from source.
#6. Added this to /etc/login.conf:

postgres:\
    :lang=hu_HU.UTF-8:\
    :setenv=LC_COLLATE=hu_HU.UTF-8:\
    :tc=default:

and ran "cap_mkdb /etc/login.conf"

#7 Database initialization this way:

su -l pgsql
 initdb  --encoding=UTF-8 --lc-collate=hu_HU.UTF-8
--lc-ctype=hu_HU.UTF-8 --locale=hu_HU.UTF-8 -D /usr/local/pgsql/data

(Note: list all available locales with "locale -a" command)

#8. Created a new user and database:


template1=# create user test with password 'test' createdb;
CREATE ROLE
template1=# create database test with owner=test template=template0
encoding='UTF-8' lc_collate='hu_HU.UTF-8' lc_ctype='hu_HU.UTF-8';
CREATE DATABASE

Finally, got it working:

create table a (name text );
insert into a values ('a');
insert into a values ('á');
insert into a values ('Á');
insert into a values ('b');
insert into a values ('e');
insert into a values ('é');
insert into a values ('É');
insert into a values ('o');
insert into a values ('ó');
insert into a values ('ö');
insert into a values ('U');
insert into a values ('Ü');
insert into a values ('Z');


select name from a order by upper(name) asc;

"a"
"á"
"Á"
"b"
"e"
"é"
"É"
"o"
"ó"
"ö"
"U"
"Ü"
"Z"

I'm not sure what was the problem. There many places where I could have
forgotten something.