Thread: how to make collation work

how to make collation work

From
Havasvölgyi Ottó
Date:
Hi all,

I am a beginner in Linux and Postgres too, so take this into consideration.
I have installed Fedora Core 3, and PostgreSQL 8.0.2 from source on it. All
right so far. Now I would like to have a Hungarian collation in my tables.
This is what I have tried so far:
I checked what Hungarian locales my Linux supports. These are: hu_HU,
hu_HU.iso88592, hu_HU.utf8, and hungarian. So far I have tried the first and
the second with initdb, set the --locale, --lc-collate and --lc-ctype to
that locale. But after filling a table with Hunagrian characters, and
selecting it all with order by didn't give the result I expected. The
encoding of the database and the client was both LATIN2. Neither worked the
upper() and lower() functions correct with Hungarian characters.

What did I do wrong? Please help.

Thanks,
Otto



Re: how to make collation work

From
tövis
Date:
  This was help me, on Debian Sarge.
  PostgreSQL problems with collation sequence!
  #dpkg-reconfigure locales        - generate - "hu_HU" and NLS hu_UTF8
  #reboot
  #/etc/init.d/postgresql stop
  #su postgres -> postgres@srv02:

  empty directory /var/lib/postgres/data

  /usr/lib/postgresql/bin/initdb -U
postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \
 -D /var/lib/postgres/data -W --pwprompt
  ... set same password for "postgres" root

  cd /var/lib/postgres/data
  rm pg_hba.conf
  rm pg_ident.conf
  rm postgresql.conf

  ln -s /etc/postgresql/pg_hba.conf pg_hba.conf
  ln -s /etc/postgresql/pg_ident.conf pg_ident.conf
  ln -s /etc/postgresql/postgresql.conf postgresql.conf

  recreate users and databases - pg_dump?

----- Original Message -----
From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: <pgsql-novice@postgresql.org>
Sent: Sunday, May 01, 2005 8:33 PM
Subject: [NOVICE] how to make collation work


> Hi all,
>
> I am a beginner in Linux and Postgres too, so take this into
> consideration.
> I have installed Fedora Core 3, and PostgreSQL 8.0.2 from source on it.
> All right so far. Now I would like to have a Hungarian collation in my
> tables. This is what I have tried so far:
> I checked what Hungarian locales my Linux supports. These are: hu_HU,
> hu_HU.iso88592, hu_HU.utf8, and hungarian. So far I have tried the first
> and the second with initdb, set the --locale, --lc-collate and --lc-ctype
> to that locale. But after filling a table with Hunagrian characters, and
> selecting it all with order by didn't give the result I expected. The
> encoding of the database and the client was both LATIN2. Neither worked
> the upper() and lower() functions correct with Hungarian characters.
>
> What did I do wrong? Please help.
>
> Thanks,
> Otto
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


Re: how to make collation work

From
Volkan YAZICI
Date:
Hi,

On 5/1/05, tövis <tovises@freemail.hu> wrote:
>   This was help me, on Debian Sarge.
>   PostgreSQL problems with collation sequence!
>   #dpkg-reconfigure locales        - generate - "hu_HU" and NLS hu_UTF8
>   #reboot

I know it'll be a little bit offtopic (when considered to list
caption), but you don't have to (also you shouldn't) reboot your
machine after a locale generation. Just login again to take changes
effect. (You know, one of the best side of Linux as a server OS is
that you don't have to reboot system after every configuration. Please
give up your Micros~1 habbits. :)

>   #/etc/init.d/postgresql stop
>   #su postgres -> postgres@srv02:
>
>   empty directory /var/lib/postgres/data
>
>   /usr/lib/postgresql/bin/initdb -U
> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \
>  -D /var/lib/postgres/data -W --pwprompt
>   ... set same password for "postgres" root

-W and --pwprompt is just the same. Just chose one of 'em.

On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote:
So far I have tried the first
> and the second with initdb, set the --locale, --lc-collate and --lc-ctype
> to that locale. But after filling a table with Hunagrian characters, and
> selecting it all with order by didn't give the result I expected. The
> encoding of the database and the client was both LATIN2. Neither worked
> the upper() and lower() functions correct with Hungarian characters.

As Tövis used above, you can pass collating locale to the database. If
there're still problems in sorting, I advice you for checking your
compiler version. (Probably gcc.) For instance, there were same
problem for Turkish locale too and we found that it's related with the
compiler, not with PostgreSQL.

And another question. How did you typed your entries to the database?
Using psql or any other client. Would you also try to import from a
file which has Hungarian characters?

Regards.

Re: how to make collation work

From
Bruno Wolff III
Date:
On Sun, May 01, 2005 at 23:28:49 +0300,
  Volkan YAZICI <volkan.yazici@gmail.com> wrote:
>
> I know it'll be a little bit offtopic (when considered to list
> caption), but you don't have to (also you shouldn't) reboot your
> machine after a locale generation. Just login again to take changes
> effect. (You know, one of the best side of Linux as a server OS is
> that you don't have to reboot system after every configuration. Please
> give up your Micros~1 habbits. :)

There is a good reason to reboot after global or server configuration changes.
You want to test that the machine will boot up properly with that
configuration. It is cheap insurance to check this while you are in a
maintenance mode rather than find problems after the machine has gone
down unexpectedly and won't come up properly.

Re: how to make collation work

From
Havasvölgyi Ottó
Date:
Volkan,

I passed the desired collation and ctype to initdb.
The version of gcc is 3.4. Should I upgrade to 4.0 perhaps?
As for filling the table, I did it with psql on the server, setting the
client encoding to LATIN2, the same as in the database. Initially it was set
to WIN1250.
Then I tested the upper() function as well, but they worked only with
English characters. For example:

select upper('á');

I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian
character.

Best Regards,
Otto




----- Original Message -----
From: "Volkan YAZICI" <volkan.yazici@gmail.com>
To: "tövis" <tovises@freemail.hu>
Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org>
Sent: Sunday, May 01, 2005 10:28 PM
Subject: Re: [NOVICE] how to make collation work


Hi,

On 5/1/05, tövis <tovises@freemail.hu> wrote:
>   This was help me, on Debian Sarge.
>   PostgreSQL problems with collation sequence!
>   #dpkg-reconfigure locales        - generate - "hu_HU" and NLS hu_UTF8
>   #reboot

I know it'll be a little bit offtopic (when considered to list
caption), but you don't have to (also you shouldn't) reboot your
machine after a locale generation. Just login again to take changes
effect. (You know, one of the best side of Linux as a server OS is
that you don't have to reboot system after every configuration. Please
give up your Micros~1 habbits. :)

>   #/etc/init.d/postgresql stop
>   #su postgres -> postgres@srv02:
>
>   empty directory /var/lib/postgres/data
>
>   /usr/lib/postgresql/bin/initdb -U
> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \
>  -D /var/lib/postgres/data -W --pwprompt
>   ... set same password for "postgres" root

-W and --pwprompt is just the same. Just chose one of 'em.

On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote:
So far I have tried the first
> and the second with initdb, set the --locale, --lc-collate and --lc-ctype
> to that locale. But after filling a table with Hunagrian characters, and
> selecting it all with order by didn't give the result I expected. The
> encoding of the database and the client was both LATIN2. Neither worked
> the upper() and lower() functions correct with Hungarian characters.

As Tövis used above, you can pass collating locale to the database. If
there're still problems in sorting, I advice you for checking your
compiler version. (Probably gcc.) For instance, there were same
problem for Turkish locale too and we found that it's related with the
compiler, not with PostgreSQL.

And another question. How did you typed your entries to the database?
Using psql or any other client. Would you also try to import from a
file which has Hungarian characters?

Regards.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings




Re: how to make collation work

From
tövis
Date:
Exciting!
Unfortunatelly I do not know how to check this quickly on my server;-( Could
you sent some simple sequence to check this out?
Üdv
    Tövis
----- Original Message -----
From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
To: <pgsql-novice@postgresql.org>
Sent: Monday, May 02, 2005 6:19 PM
Subject: Re: [NOVICE] how to make collation work


> Volkan,
>
> I passed the desired collation and ctype to initdb.
> The version of gcc is 3.4. Should I upgrade to 4.0 perhaps?
> As for filling the table, I did it with psql on the server, setting the
> client encoding to LATIN2, the same as in the database. Initially it was
> set
> to WIN1250.
> Then I tested the upper() function as well, but they worked only with
> English characters. For example:
>
> select upper('á');
>
> I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian
> character.
>
> Best Regards,
> Otto
>
>
>
>
> ----- Original Message -----
> From: "Volkan YAZICI" <volkan.yazici@gmail.com>
> To: "tövis" <tovises@freemail.hu>
> Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org>
> Sent: Sunday, May 01, 2005 10:28 PM
> Subject: Re: [NOVICE] how to make collation work
>
>
> Hi,
>
> On 5/1/05, tövis <tovises@freemail.hu> wrote:
>>   This was help me, on Debian Sarge.
>>   PostgreSQL problems with collation sequence!
>>   #dpkg-reconfigure locales        - generate - "hu_HU" and NLS hu_UTF8
>>   #reboot
>
> I know it'll be a little bit offtopic (when considered to list
> caption), but you don't have to (also you shouldn't) reboot your
> machine after a locale generation. Just login again to take changes
> effect. (You know, one of the best side of Linux as a server OS is
> that you don't have to reboot system after every configuration. Please
> give up your Micros~1 habbits. :)
>
>>   #/etc/init.d/postgresql stop
>>   #su postgres -> postgres@srv02:
>>
>>   empty directory /var/lib/postgres/data
>>
>>   /usr/lib/postgresql/bin/initdb -U
>> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \
>>  -D /var/lib/postgres/data -W --pwprompt
>>   ... set same password for "postgres" root
>
> -W and --pwprompt is just the same. Just chose one of 'em.
>
> On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote:
> So far I have tried the first
>> and the second with initdb, set the --locale, --lc-collate and --lc-ctype
>> to that locale. But after filling a table with Hunagrian characters, and
>> selecting it all with order by didn't give the result I expected. The
>> encoding of the database and the client was both LATIN2. Neither worked
>> the upper() and lower() functions correct with Hungarian characters.
>
> As Tövis used above, you can pass collating locale to the database. If
> there're still problems in sorting, I advice you for checking your
> compiler version. (Probably gcc.) For instance, there were same
> problem for Turkish locale too and we found that it's related with the
> compiler, not with PostgreSQL.
>
> And another question. How did you typed your entries to the database?
> Using psql or any other client. Would you also try to import from a
> file which has Hungarian characters?
>
> Regards.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: how to make collation work

From
Havasvölgyi Ottó
Date:
Tövis,

Just do this in psql:

\encoding LATIN2 --set client encoding

create table collation_test(c char(1));  --if default encoding is LATIN2

--fill the table
insert into collation_test values('á');
insert into collation_test values('é');
insert into collation_test values('í');
insert into collation_test values('ó');
insert into collation_test values('ö');
insert into collation_test values('ő');
insert into collation_test values('ú');
insert into collation_test values('ü');
insert into collation_test values('ű');
insert into collation_test values('Á');
insert into collation_test values('É');
insert into collation_test values('Í');
insert into collation_test values('Ó');
insert into collation_test values('Ö');
insert into collation_test values('Ő');
insert into collation_test values('Ú');
insert into collation_test values('Ü');
insert into collation_test values('Ű');
insert into collation_test values('A');
insert into collation_test values('g');
insert into collation_test values('k');
insert into collation_test values('S');

select * from collation_test order by c;

select upper('á'); --'á' for me

select lower('Á'); --'Á' for me I guess



I got all characters correctly in the table query, they were just in the
wrong order.

However, the upper() didn't work correctly for me. I did not try lower()
after that.

By the way what glibc version do you use? How did you manage to get
Hungarian collation work correctly?

Üdv / Best Regards,
Otto


----- Original Message -----
From: "tövis" <tovises@freemail.hu>
To: "pgsql novice" <pgsql-novice@postgresql.org>; "Havasvölgyi Ottó"
<h.otto@freemail.hu>
Sent: Monday, May 02, 2005 6:36 PM
Subject: Re: [NOVICE] how to make collation work


> Exciting!
> Unfortunatelly I do not know how to check this quickly on my server;-(
> Could you sent some simple sequence to check this out?
> Üdv
>    Tövis
> ----- Original Message -----
> From: "Havasvölgyi Ottó" <h.otto@freemail.hu>
> To: <pgsql-novice@postgresql.org>
> Sent: Monday, May 02, 2005 6:19 PM
> Subject: Re: [NOVICE] how to make collation work
>
>
>> Volkan,
>>
>> I passed the desired collation and ctype to initdb.
>> The version of gcc is 3.4. Should I upgrade to 4.0 perhaps?
>> As for filling the table, I did it with psql on the server, setting the
>> client encoding to LATIN2, the same as in the database. Initially it was
>> set
>> to WIN1250.
>> Then I tested the upper() function as well, but they worked only with
>> English characters. For example:
>>
>> select upper('á');
>>
>> I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian
>> character.
>>
>> Best Regards,
>> Otto
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Volkan YAZICI" <volkan.yazici@gmail.com>
>> To: "tövis" <tovises@freemail.hu>
>> Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org>
>> Sent: Sunday, May 01, 2005 10:28 PM
>> Subject: Re: [NOVICE] how to make collation work
>>
>>
>> Hi,
>>
>> On 5/1/05, tövis <tovises@freemail.hu> wrote:
>>>   This was help me, on Debian Sarge.
>>>   PostgreSQL problems with collation sequence!
>>>   #dpkg-reconfigure locales        - generate - "hu_HU" and NLS hu_UTF8
>>>   #reboot
>>
>> I know it'll be a little bit offtopic (when considered to list
>> caption), but you don't have to (also you shouldn't) reboot your
>> machine after a locale generation. Just login again to take changes
>> effect. (You know, one of the best side of Linux as a server OS is
>> that you don't have to reboot system after every configuration. Please
>> give up your Micros~1 habbits. :)
>>
>>>   #/etc/init.d/postgresql stop
>>>   #su postgres -> postgres@srv02:
>>>
>>>   empty directory /var/lib/postgres/data
>>>
>>>   /usr/lib/postgresql/bin/initdb -U
>>> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \
>>>  -D /var/lib/postgres/data -W --pwprompt
>>>   ... set same password for "postgres" root
>>
>> -W and --pwprompt is just the same. Just chose one of 'em.
>>
>> On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote:
>> So far I have tried the first
>>> and the second with initdb, set the --locale, --lc-collate
>>> and --lc-ctype
>>> to that locale. But after filling a table with Hunagrian characters, and
>>> selecting it all with order by didn't give the result I expected. The
>>> encoding of the database and the client was both LATIN2. Neither worked
>>> the upper() and lower() functions correct with Hungarian characters.
>>
>> As Tövis used above, you can pass collating locale to the database. If
>> there're still problems in sorting, I advice you for checking your
>> compiler version. (Probably gcc.) For instance, there were same
>> problem for Turkish locale too and we found that it's related with the
>> compiler, not with PostgreSQL.
>>
>> And another question. How did you typed your entries to the database?
>> Using psql or any other client. Would you also try to import from a
>> file which has Hungarian characters?
>>
>> Regards.
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 7: don't forget to increase your free space map settings
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>



Re: how to make collation work

From
Havasvölgyi Ottó
Date:
Volkan,

I did not have the problem you had had, I mean the missing Hungarian
(Turkish in your case) capital letters. I can load them into the table, and
in the query all letters appear correctly, just the order is wrong.
As the docs says, for upper() and lower() the lc-ctype switch is
responsible. This determines what is a letter, and what letter is the
capital pair of another letter.
I have applied both lc-ctype, lc-collate switches with the hu_HU locale with
no result. I will check glibc. Thanks for the suggestion.

Best Regards,
Otto



----- Original Message -----
From: "Volkan YAZICI" <volkan.yazici@gmail.com>
To: "Havasvölgyi Ottó" <h.otto@freemail.hu>
Sent: Monday, May 02, 2005 6:42 PM
Subject: Re: [NOVICE] how to make collation work


> Hi,
>
> On 5/2/05, Havasvölgyi Ottó <h.otto@freemail.hu> wrote:
>> I passed the desired collation and ctype to initdb.
>> The version of gcc is 3.4. Should I upgrade to 4.0 perhaps?
>
> Sorry I was mistaken. Just replace "gcc" words with "glibc" in my
> previous reply.
>
> We have a similar problem for turkish characters too. For instance,
> upper('ı') (dotless i) doesn't work on glibc < 2.3.4 version. I'm not
> sure if Hungarian chars. have the same problem with glibc. I can only
> advice you checking bug lists for Hungarian localization of glibc.
>
>> select upper('á');
>>
>> I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian
>> character.
>
> A similar scenario:
>
> $ dpkg -l | grep libc6
> ii  libc6          2.3.2.ds1-21   GNU C Library: Shared libraries and
> Timezone
> ii  libc6-dev      2.3.2.ds1-21   GNU C Library: Development Libraries and
> Hea
> $ psql -l | grep sil
> sil       | knt      | LATIN5
> $ psql sil
> sil=> SET client_encoding TO LATIN5;
> SET
> sil=> SELECT upper('ı');
> upper
> -------
> ı
> (1 row)
> -- It's not working.
>
> Furthermore, problems are not bounded with the above ones for Turkish
> chars. You have to pass "-E UNICODE" param. to initdb too. (ISO-8859-9
> seems like a little bit buggy in here.) Anyway, I'm not so experienced
> on localization. Just summarizing some threads discussed in Turkish
> localization mailing lists. (Mostly, Devrim Gündüz worked on these
> problems.)
>
> On the other hand, tövis said that he achieved to solve problems with
> Hungarian chars. Maybe you can check your glibc and PostgreSQL
> versions with him.
>
> Regards.
>