Thread: Sorting Problem in UNICODE/german
Hi there! I have a Problem with a DB that was created in UNICODE * createdb -E UNICODE and actually shows that it _is_ in UNICODE. i was able to input data and can read it and everything is fine. but when i want to "ORDER BY ..." it does not sort the german Umlauts at the correct postition. should be: m n o ö p and is: ö a b c d I have tried starting postgres with LC_ALL=de_AT.utf8@euro locale but that did not help. what xould i do? regs, klaus
Klaus Ita <postgres@stro.at> writes: > I have tried starting postgres with LC_ALL=de_AT.utf8@euro > locale but that did not help. You need to run initdb under that setting. See "Localization" in the documentation. regards, tom lane
On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote: > Klaus Ita <postgres@stro.at> writes: > > I have tried starting postgres with LC_ALL=de_AT.utf8@euro > > locale but that did not help. > i did read the docs and am still not quite happy with my sorting results. ok initdb has been rerun made sure, i had the locale: locale -a created new db-cluster with LC_ALL=de_AT.utf8@euro initdb --locale=de_AT.utf8@euro -E UNICODE -D /dev/shm/pgutf8 and then still the sorting was not right when i restored another UNICODE db. another "funny" thing is: ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile Abend Oma Ãterreich Ãerflieger Unter Zetrix this is also wrong (There should be 'Unter' and then 'U:berflieger' [Überflieger]). so is this a libc bug? thank you for your help so far! I more than appreciate it. Support for this DB is sooo much better than for oracle! klaus
Klaus Ita schrob: > On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote: >> Klaus Ita <postgres@stro.at> writes: >> > I have tried starting postgres with LC_ALL=de_AT.utf8@euro >> > locale but that did not help. >> > > i did read the docs and am still not quite happy with my sorting results. > ok initdb has been rerun > > made sure, i had the locale: > > locale -a > > created new db-cluster with > LC_ALL=de_AT.utf8@euro initdb --locale=de_AT.utf8@euro -E UNICODE -D /dev/shm/pgutf8 > > and then still the sorting was not right when i restored another > UNICODE db. Well, I used the very same command with 8.0.3 to create a database, and the sort order was correct: --8<---------------cut here---------------start------------->8--- scratch=# select w from w order by w; w ------------- Abend Oma Österreich Überflieger Unter Zetrix (6 rows) --8<---------------cut here---------------end--------------->8--- So I guess there was some misconfiguration of your current client_encoding during import, or maybe the dump of your unicode db got unexpectedly converted by improper settings during dumping. > another "funny" thing is: > > ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile > Abend > Oma > Ãterreich > Ãerflieger > Unter > Zetrix > > this is also wrong (There should be 'Unter' and then 'U:berflieger' > [Überflieger]). so is this a libc bug? The sort order is correct, so libc did succeed in its part. Maybe your terminal is having issues with utf-8? If you're using xterm: Did you run it with -u8 or some utf-8-enabling X-resource? To verify that the terminal is working properly, typing echo ö > /tmp/foo file /tmp/foo on a shell should tell you that you have a utf-8 text file. HTH Andreas --
Sorry, I just reread your mail: Your MUA is declaring it with > Content-Type: text/plain; charset=unknown-8bit This makes it even harder to discuss problems with Umlauts :-). Andreas Seltenreich schrob: > Klaus Ita schrob: > >> another "funny" thing is: >> >> ita@aipc54:~/.mutt$ LC_ALL=de_AT.utf8@euro sort /tmp/testfile >> Abend >> Oma >> Ãterreich >> Ãerflieger >> Unter >> Zetrix >> >> this is also wrong (There should be 'Unter' and then 'U:berflieger' >> [Überflieger]). so is this a libc bug? I think I got your point now. Libc appears to be using iso-14651 sorting for all "de" locales. I'm afraid you will have compile a customized locale to depart from that. regards, Andreas --
Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes: >> Klaus Ita schrob: >>> this is also wrong (There should be 'Unter' and then 'U:berflieger' >>> [Überflieger]). so is this a libc bug? > I think I got your point now. Libc appears to be using iso-14651 > sorting for all "de" locales. I'm afraid you will have compile a > customized locale to depart from that. I wouldn't call it a libc bug, but a bug in the locale definition. In any case it doesn't appear to be Postgres' problem --- if we sort the same way "sort" does under the same locale setting, then we are doing what we expect. I think at this point Klaus needs to find some people who know about hacking locale definitions. I sure don't know enough about them to help further. Is there a libc mailing list anywhere? One thing I do know --- if you install a new version of the locale Postgres is using, you'd better re-initdb, or at least REINDEX all your indexes on textual columns. Changing sort order is equivalent to making such indexes corrupt. regards, tom lane