Thread: how to make collation work
Hi all, I am a beginner in Linux and Postgres too, so take this into consideration. I have installed Fedora Core 3, and PostgreSQL 8.0.2 from source on it. All right so far. Now I would like to have a Hungarian collation in my tables. This is what I have tried so far: I checked what Hungarian locales my Linux supports. These are: hu_HU, hu_HU.iso88592, hu_HU.utf8, and hungarian. So far I have tried the first and the second with initdb, set the --locale, --lc-collate and --lc-ctype to that locale. But after filling a table with Hunagrian characters, and selecting it all with order by didn't give the result I expected. The encoding of the database and the client was both LATIN2. Neither worked the upper() and lower() functions correct with Hungarian characters. What did I do wrong? Please help. Thanks, Otto
This was help me, on Debian Sarge. PostgreSQL problems with collation sequence! #dpkg-reconfigure locales - generate - "hu_HU" and NLS hu_UTF8 #reboot #/etc/init.d/postgresql stop #su postgres -> postgres@srv02: empty directory /var/lib/postgres/data /usr/lib/postgresql/bin/initdb -U postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \ -D /var/lib/postgres/data -W --pwprompt ... set same password for "postgres" root cd /var/lib/postgres/data rm pg_hba.conf rm pg_ident.conf rm postgresql.conf ln -s /etc/postgresql/pg_hba.conf pg_hba.conf ln -s /etc/postgresql/pg_ident.conf pg_ident.conf ln -s /etc/postgresql/postgresql.conf postgresql.conf recreate users and databases - pg_dump? ----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> To: <pgsql-novice@postgresql.org> Sent: Sunday, May 01, 2005 8:33 PM Subject: [NOVICE] how to make collation work > Hi all, > > I am a beginner in Linux and Postgres too, so take this into > consideration. > I have installed Fedora Core 3, and PostgreSQL 8.0.2 from source on it. > All right so far. Now I would like to have a Hungarian collation in my > tables. This is what I have tried so far: > I checked what Hungarian locales my Linux supports. These are: hu_HU, > hu_HU.iso88592, hu_HU.utf8, and hungarian. So far I have tried the first > and the second with initdb, set the --locale, --lc-collate and --lc-ctype > to that locale. But after filling a table with Hunagrian characters, and > selecting it all with order by didn't give the result I expected. The > encoding of the database and the client was both LATIN2. Neither worked > the upper() and lower() functions correct with Hungarian characters. > > What did I do wrong? Please help. > > Thanks, > Otto > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Hi, On 5/1/05, tövis <tovises@freemail.hu> wrote: > This was help me, on Debian Sarge. > PostgreSQL problems with collation sequence! > #dpkg-reconfigure locales - generate - "hu_HU" and NLS hu_UTF8 > #reboot I know it'll be a little bit offtopic (when considered to list caption), but you don't have to (also you shouldn't) reboot your machine after a locale generation. Just login again to take changes effect. (You know, one of the best side of Linux as a server OS is that you don't have to reboot system after every configuration. Please give up your Micros~1 habbits. :) > #/etc/init.d/postgresql stop > #su postgres -> postgres@srv02: > > empty directory /var/lib/postgres/data > > /usr/lib/postgresql/bin/initdb -U > postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \ > -D /var/lib/postgres/data -W --pwprompt > ... set same password for "postgres" root -W and --pwprompt is just the same. Just chose one of 'em. On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote: So far I have tried the first > and the second with initdb, set the --locale, --lc-collate and --lc-ctype > to that locale. But after filling a table with Hunagrian characters, and > selecting it all with order by didn't give the result I expected. The > encoding of the database and the client was both LATIN2. Neither worked > the upper() and lower() functions correct with Hungarian characters. As Tövis used above, you can pass collating locale to the database. If there're still problems in sorting, I advice you for checking your compiler version. (Probably gcc.) For instance, there were same problem for Turkish locale too and we found that it's related with the compiler, not with PostgreSQL. And another question. How did you typed your entries to the database? Using psql or any other client. Would you also try to import from a file which has Hungarian characters? Regards.
On Sun, May 01, 2005 at 23:28:49 +0300, Volkan YAZICI <volkan.yazici@gmail.com> wrote: > > I know it'll be a little bit offtopic (when considered to list > caption), but you don't have to (also you shouldn't) reboot your > machine after a locale generation. Just login again to take changes > effect. (You know, one of the best side of Linux as a server OS is > that you don't have to reboot system after every configuration. Please > give up your Micros~1 habbits. :) There is a good reason to reboot after global or server configuration changes. You want to test that the machine will boot up properly with that configuration. It is cheap insurance to check this while you are in a maintenance mode rather than find problems after the machine has gone down unexpectedly and won't come up properly.
Volkan, I passed the desired collation and ctype to initdb. The version of gcc is 3.4. Should I upgrade to 4.0 perhaps? As for filling the table, I did it with psql on the server, setting the client encoding to LATIN2, the same as in the database. Initially it was set to WIN1250. Then I tested the upper() function as well, but they worked only with English characters. For example: select upper('á'); I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian character. Best Regards, Otto ----- Original Message ----- From: "Volkan YAZICI" <volkan.yazici@gmail.com> To: "tövis" <tovises@freemail.hu> Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org> Sent: Sunday, May 01, 2005 10:28 PM Subject: Re: [NOVICE] how to make collation work Hi, On 5/1/05, tövis <tovises@freemail.hu> wrote: > This was help me, on Debian Sarge. > PostgreSQL problems with collation sequence! > #dpkg-reconfigure locales - generate - "hu_HU" and NLS hu_UTF8 > #reboot I know it'll be a little bit offtopic (when considered to list caption), but you don't have to (also you shouldn't) reboot your machine after a locale generation. Just login again to take changes effect. (You know, one of the best side of Linux as a server OS is that you don't have to reboot system after every configuration. Please give up your Micros~1 habbits. :) > #/etc/init.d/postgresql stop > #su postgres -> postgres@srv02: > > empty directory /var/lib/postgres/data > > /usr/lib/postgresql/bin/initdb -U > postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \ > -D /var/lib/postgres/data -W --pwprompt > ... set same password for "postgres" root -W and --pwprompt is just the same. Just chose one of 'em. On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote: So far I have tried the first > and the second with initdb, set the --locale, --lc-collate and --lc-ctype > to that locale. But after filling a table with Hunagrian characters, and > selecting it all with order by didn't give the result I expected. The > encoding of the database and the client was both LATIN2. Neither worked > the upper() and lower() functions correct with Hungarian characters. As Tövis used above, you can pass collating locale to the database. If there're still problems in sorting, I advice you for checking your compiler version. (Probably gcc.) For instance, there were same problem for Turkish locale too and we found that it's related with the compiler, not with PostgreSQL. And another question. How did you typed your entries to the database? Using psql or any other client. Would you also try to import from a file which has Hungarian characters? Regards. ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
Exciting! Unfortunatelly I do not know how to check this quickly on my server;-( Could you sent some simple sequence to check this out? Üdv Tövis ----- Original Message ----- From: "Havasvölgyi Ottó" <h.otto@freemail.hu> To: <pgsql-novice@postgresql.org> Sent: Monday, May 02, 2005 6:19 PM Subject: Re: [NOVICE] how to make collation work > Volkan, > > I passed the desired collation and ctype to initdb. > The version of gcc is 3.4. Should I upgrade to 4.0 perhaps? > As for filling the table, I did it with psql on the server, setting the > client encoding to LATIN2, the same as in the database. Initially it was > set > to WIN1250. > Then I tested the upper() function as well, but they worked only with > English characters. For example: > > select upper('á'); > > I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian > character. > > Best Regards, > Otto > > > > > ----- Original Message ----- > From: "Volkan YAZICI" <volkan.yazici@gmail.com> > To: "tövis" <tovises@freemail.hu> > Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org> > Sent: Sunday, May 01, 2005 10:28 PM > Subject: Re: [NOVICE] how to make collation work > > > Hi, > > On 5/1/05, tövis <tovises@freemail.hu> wrote: >> This was help me, on Debian Sarge. >> PostgreSQL problems with collation sequence! >> #dpkg-reconfigure locales - generate - "hu_HU" and NLS hu_UTF8 >> #reboot > > I know it'll be a little bit offtopic (when considered to list > caption), but you don't have to (also you shouldn't) reboot your > machine after a locale generation. Just login again to take changes > effect. (You know, one of the best side of Linux as a server OS is > that you don't have to reboot system after every configuration. Please > give up your Micros~1 habbits. :) > >> #/etc/init.d/postgresql stop >> #su postgres -> postgres@srv02: >> >> empty directory /var/lib/postgres/data >> >> /usr/lib/postgresql/bin/initdb -U >> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \ >> -D /var/lib/postgres/data -W --pwprompt >> ... set same password for "postgres" root > > -W and --pwprompt is just the same. Just chose one of 'em. > > On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote: > So far I have tried the first >> and the second with initdb, set the --locale, --lc-collate and --lc-ctype >> to that locale. But after filling a table with Hunagrian characters, and >> selecting it all with order by didn't give the result I expected. The >> encoding of the database and the client was both LATIN2. Neither worked >> the upper() and lower() functions correct with Hungarian characters. > > As Tövis used above, you can pass collating locale to the database. If > there're still problems in sorting, I advice you for checking your > compiler version. (Probably gcc.) For instance, there were same > problem for Turkish locale too and we found that it's related with the > compiler, not with PostgreSQL. > > And another question. How did you typed your entries to the database? > Using psql or any other client. Would you also try to import from a > file which has Hungarian characters? > > Regards. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Tövis, Just do this in psql: \encoding LATIN2 --set client encoding create table collation_test(c char(1)); --if default encoding is LATIN2 --fill the table insert into collation_test values('á'); insert into collation_test values('é'); insert into collation_test values('í'); insert into collation_test values('ó'); insert into collation_test values('ö'); insert into collation_test values('ő'); insert into collation_test values('ú'); insert into collation_test values('ü'); insert into collation_test values('ű'); insert into collation_test values('Á'); insert into collation_test values('É'); insert into collation_test values('Í'); insert into collation_test values('Ó'); insert into collation_test values('Ö'); insert into collation_test values('Ő'); insert into collation_test values('Ú'); insert into collation_test values('Ü'); insert into collation_test values('Ű'); insert into collation_test values('A'); insert into collation_test values('g'); insert into collation_test values('k'); insert into collation_test values('S'); select * from collation_test order by c; select upper('á'); --'á' for me select lower('Á'); --'Á' for me I guess I got all characters correctly in the table query, they were just in the wrong order. However, the upper() didn't work correctly for me. I did not try lower() after that. By the way what glibc version do you use? How did you manage to get Hungarian collation work correctly? Üdv / Best Regards, Otto ----- Original Message ----- From: "tövis" <tovises@freemail.hu> To: "pgsql novice" <pgsql-novice@postgresql.org>; "Havasvölgyi Ottó" <h.otto@freemail.hu> Sent: Monday, May 02, 2005 6:36 PM Subject: Re: [NOVICE] how to make collation work > Exciting! > Unfortunatelly I do not know how to check this quickly on my server;-( > Could you sent some simple sequence to check this out? > Üdv > Tövis > ----- Original Message ----- > From: "Havasvölgyi Ottó" <h.otto@freemail.hu> > To: <pgsql-novice@postgresql.org> > Sent: Monday, May 02, 2005 6:19 PM > Subject: Re: [NOVICE] how to make collation work > > >> Volkan, >> >> I passed the desired collation and ctype to initdb. >> The version of gcc is 3.4. Should I upgrade to 4.0 perhaps? >> As for filling the table, I did it with psql on the server, setting the >> client encoding to LATIN2, the same as in the database. Initially it was >> set >> to WIN1250. >> Then I tested the upper() function as well, but they worked only with >> English characters. For example: >> >> select upper('á'); >> >> I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian >> character. >> >> Best Regards, >> Otto >> >> >> >> >> ----- Original Message ----- >> From: "Volkan YAZICI" <volkan.yazici@gmail.com> >> To: "tövis" <tovises@freemail.hu> >> Cc: "PostgreSQL Novice" <pgsql-novice@postgresql.org> >> Sent: Sunday, May 01, 2005 10:28 PM >> Subject: Re: [NOVICE] how to make collation work >> >> >> Hi, >> >> On 5/1/05, tövis <tovises@freemail.hu> wrote: >>> This was help me, on Debian Sarge. >>> PostgreSQL problems with collation sequence! >>> #dpkg-reconfigure locales - generate - "hu_HU" and NLS hu_UTF8 >>> #reboot >> >> I know it'll be a little bit offtopic (when considered to list >> caption), but you don't have to (also you shouldn't) reboot your >> machine after a locale generation. Just login again to take changes >> effect. (You know, one of the best side of Linux as a server OS is >> that you don't have to reboot system after every configuration. Please >> give up your Micros~1 habbits. :) >> >>> #/etc/init.d/postgresql stop >>> #su postgres -> postgres@srv02: >>> >>> empty directory /var/lib/postgres/data >>> >>> /usr/lib/postgresql/bin/initdb -U >>> postgres --locale=hu_HU --lc-collate=hu_HU -E LATIN2 \ >>> -D /var/lib/postgres/data -W --pwprompt >>> ... set same password for "postgres" root >> >> -W and --pwprompt is just the same. Just chose one of 'em. >> >> On 5/1/05, "Havasvölgyi Ottó" <h.otto@freemail.hu> wrote: >> So far I have tried the first >>> and the second with initdb, set the --locale, --lc-collate >>> and --lc-ctype >>> to that locale. But after filling a table with Hunagrian characters, and >>> selecting it all with order by didn't give the result I expected. The >>> encoding of the database and the client was both LATIN2. Neither worked >>> the upper() and lower() functions correct with Hungarian characters. >> >> As Tövis used above, you can pass collating locale to the database. If >> there're still problems in sorting, I advice you for checking your >> compiler version. (Probably gcc.) For instance, there were same >> problem for Turkish locale too and we found that it's related with the >> compiler, not with PostgreSQL. >> >> And another question. How did you typed your entries to the database? >> Using psql or any other client. Would you also try to import from a >> file which has Hungarian characters? >> >> Regards. >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 7: don't forget to increase your free space map settings >> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > >
Volkan, I did not have the problem you had had, I mean the missing Hungarian (Turkish in your case) capital letters. I can load them into the table, and in the query all letters appear correctly, just the order is wrong. As the docs says, for upper() and lower() the lc-ctype switch is responsible. This determines what is a letter, and what letter is the capital pair of another letter. I have applied both lc-ctype, lc-collate switches with the hu_HU locale with no result. I will check glibc. Thanks for the suggestion. Best Regards, Otto ----- Original Message ----- From: "Volkan YAZICI" <volkan.yazici@gmail.com> To: "Havasvölgyi Ottó" <h.otto@freemail.hu> Sent: Monday, May 02, 2005 6:42 PM Subject: Re: [NOVICE] how to make collation work > Hi, > > On 5/2/05, Havasvölgyi Ottó <h.otto@freemail.hu> wrote: >> I passed the desired collation and ctype to initdb. >> The version of gcc is 3.4. Should I upgrade to 4.0 perhaps? > > Sorry I was mistaken. Just replace "gcc" words with "glibc" in my > previous reply. > > We have a similar problem for turkish characters too. For instance, > upper('ı') (dotless i) doesn't work on glibc < 2.3.4 version. I'm not > sure if Hungarian chars. have the same problem with glibc. I can only > advice you checking bug lists for Hungarian localization of glibc. > >> select upper('á'); >> >> I expected 'Á', but I got 'á'. So upper() didn't change this Hungarian >> character. > > A similar scenario: > > $ dpkg -l | grep libc6 > ii libc6 2.3.2.ds1-21 GNU C Library: Shared libraries and > Timezone > ii libc6-dev 2.3.2.ds1-21 GNU C Library: Development Libraries and > Hea > $ psql -l | grep sil > sil | knt | LATIN5 > $ psql sil > sil=> SET client_encoding TO LATIN5; > SET > sil=> SELECT upper('ı'); > upper > ------- > ı > (1 row) > -- It's not working. > > Furthermore, problems are not bounded with the above ones for Turkish > chars. You have to pass "-E UNICODE" param. to initdb too. (ISO-8859-9 > seems like a little bit buggy in here.) Anyway, I'm not so experienced > on localization. Just summarizing some threads discussed in Turkish > localization mailing lists. (Mostly, Devrim Gündüz worked on these > problems.) > > On the other hand, tövis said that he achieved to solve problems with > Hungarian chars. Maybe you can check your glibc and PostgreSQL > versions with him. > > Regards. >