Thread: Correct Unicode sorting depends on how initdb was run
Hi there, Recently I stumbled over a very strange problem: I had two very similar setups (RHL9 with latest updates, pgsql-7.3.2, parameters in "show all" the same, databases with encoding=3DUNICODE, loaded from the same database dump) where the sorting on one was erroneous with regards to accented characters. After hours of fiddling I found out that the erroneous one was initdb'ed with locale set to en_US, while the one correctly sorting was initdb'ed with locale set to en_US.UTF-8. I pg_dumpall'ed the wrong one, redid the initdb with locale set to en_US.UTF-8 and loaded the dumped databases, now the sorting order was correct. Is this expected behaviour (I do not think so)? Nils --=20 Nils Philippsen / Red Hat / nphilipp@redhat.com "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- B. Franklin, 1759 PGP fingerprint: C4A8 9474 5C4C ADE3 2B8F 656D 47D8 9B65 6951 3011
Nils Philippsen writes: > Is this expected behaviour Yes. -- Peter Eisentraut peter_e@gmx.net
On Mon, 2003-08-11 at 10:49, Peter Eisentraut wrote: > Nils Philippsen writes: >=20 > > Is this expected behaviour >=20 > Yes. Hmm. I ask myself whether this is desired behaviour, too. Given that this isn't obviously documented (at least I didn't find it), I'd expect sort order to be dependent on server_encoding or client_encoding, but not on a locale setting that was present at initialisation of the database structures (and which isn't changeable except by dump&reload). Nils --=20 Nils Philippsen / Red Hat / nphilipp@redhat.com "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- B. Franklin, 1759 PGP fingerprint: C4A8 9474 5C4C ADE3 2B8F 656D 47D8 9B65 6951 3011
Nils Philippsen writes: > On Mon, 2003-08-11 at 10:49, Peter Eisentraut wrote: > > Nils Philippsen writes: > > > > > Is this expected behaviour > > > > Yes. > > Hmm. I ask myself whether this is desired behaviour, too. No, but it will take a lot of work to fix this, such as implementing our own locale library. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Nils Philippsen writes: >> Hmm. I ask myself whether this is desired behaviour, too. > No, but it will take a lot of work to fix this, such as implementing our > own locale library. We should, however, look into using C99-spec <wctype.h> routines where available --- the existing logic that depends on <ctype.h> stuff cannot work with multibyte encodings. I am not sure if this has any user-visible effects beyond upper()/lower(). regards, tom lane