Thread: Correct Unicode sorting depends on how initdb was run

Correct Unicode sorting depends on how initdb was run

From

Nils Philippsen

Date:

11 August 2003, 03:37:53

Hi there,

Recently I stumbled over a very strange problem: I had two very similar
setups (RHL9 with latest updates, pgsql-7.3.2, parameters in "show all"
the same, databases with encoding=3DUNICODE, loaded from the same database
dump) where the sorting on one was erroneous with regards to accented
characters.

After hours of fiddling I found out that the erroneous one was initdb'ed
with locale set to en_US, while the one correctly sorting was initdb'ed
with locale set to en_US.UTF-8. I pg_dumpall'ed the wrong one, redid the
initdb with locale set to en_US.UTF-8 and loaded the dumped databases,
now the sorting order was correct.

Is this expected behaviour (I do not think so)?

Nils
--=20
     Nils Philippsen    /    Red Hat    /    nphilipp@redhat.com
"They that can give up essential liberty to obtain a little temporary
 safety deserve neither liberty nor safety."     -- B. Franklin, 1759
 PGP fingerprint:  C4A8 9474 5C4C ADE3 2B8F  656D 47D8 9B65 6951 3011

Re: Correct Unicode sorting depends on how initdb was run

From

Peter Eisentraut

Date:

11 August 2003, 05:58:50

Nils Philippsen writes:

> Is this expected behaviour

Yes.

--
Peter Eisentraut   peter_e@gmx.net

Re: Correct Unicode sorting depends on how initdb was run

From

Nils Philippsen

Date:

11 August 2003, 08:27:48

On Mon, 2003-08-11 at 10:49, Peter Eisentraut wrote:
> Nils Philippsen writes:
>=20
> > Is this expected behaviour
>=20
> Yes.

Hmm. I ask myself whether this is desired behaviour, too.

Given that this isn't obviously documented (at least I didn't find it),
I'd expect sort order to be dependent on server_encoding or
client_encoding, but not on a locale setting that was present at
initialisation of the database structures (and which isn't changeable
except by dump&reload).

Nils
--=20
     Nils Philippsen    /    Red Hat    /    nphilipp@redhat.com
"They that can give up essential liberty to obtain a little temporary
 safety deserve neither liberty nor safety."     -- B. Franklin, 1759
 PGP fingerprint:  C4A8 9474 5C4C ADE3 2B8F  656D 47D8 9B65 6951 3011

Re: Correct Unicode sorting depends on how initdb was run

From

Peter Eisentraut

Date:

11 August 2003, 08:48:45

Nils Philippsen writes:

> On Mon, 2003-08-11 at 10:49, Peter Eisentraut wrote:
> > Nils Philippsen writes:
> >
> > > Is this expected behaviour
> >
> > Yes.
>
> Hmm. I ask myself whether this is desired behaviour, too.

No, but it will take a lot of work to fix this, such as implementing our
own locale library.

--
Peter Eisentraut   peter_e@gmx.net

Re: Correct Unicode sorting depends on how initdb was run

From

Tom Lane

Date:

11 August 2003, 12:04:32

Peter Eisentraut <peter_e@gmx.net> writes:
> Nils Philippsen writes:
>> Hmm. I ask myself whether this is desired behaviour, too.

> No, but it will take a lot of work to fix this, such as implementing our
> own locale library.

We should, however, look into using C99-spec <wctype.h> routines where
available --- the existing logic that depends on <ctype.h> stuff cannot
work with multibyte encodings.  I am not sure if this has any
user-visible effects beyond upper()/lower().

            regards, tom lane