Re: BUG #2317: Wrong sorting order for (VW) - Mailing list pgsql-bugs

From Magnus Hagander
Subject Re: BUG #2317: Wrong sorting order for (VW)
Date
Msg-id 6BCB9D8A16AC4241919521715F4D8BCEA35143@algol.sollentuna.se
Whole thread Raw
In response to BUG #2317: Wrong sorting order for (VW)  ("TomasKlockar" <tomask@omicron.se>)
List pgsql-bugs
>             The following bug has been logged online:
>=20=09=09=09
>             Bug reference:      2317
>             Logged by:          TomasKlockar
>             Email address:      tomask@omicron.se
>             PostgreSQL version: 7.3.2/7.4.7
>             Operating system:   linux(fedora)
>             Description:        Wrong sorting order for (VW)
>             Details:
>=20=09=09=09
>             SELECT cname FROM clients ORDER BY cname;
>=20=09=09=09
>             sorts W before V and that is a mistake.
>=20=09=09=09
>             the result is
>=20=09=09=09
>             V
>             W
>             WHI
>             Vi
>             Wi
>             Volvo
>=20=09=09=09
>             Correct order would be to place all vV=20
> before all wW.
>=20=09=09=09
>             I think the database was initialized=20
> with UTF8 but it might=20
>=20=09=09=09=20=20=20=20=20=20
>=20
>         have been=20
>=20=09=09=20=20=20=20
>=20
>             ISO_8859-1, however the error is easily=20
> repeatable and ill=20
>=20=09=09=09=20=20=20=20=20=20
>=20
>         be happy to=20
>=20=09=09=20=20=20=20
>=20
>             provide a dump.
>=20=09=09=09
>             I tested this on 7.3.2 and 7.4.7 and=20
> the error was repeatable.
>=20=09=09=09=20=20=20=20=20=20
>=20
>         I think the most important thing we'd need to=20
> know is what=20
>         locale the database was initialized with in=20
> order to try to reproduce.
>=20=09=09
>         If you put similar data in a file and use the=20
> unix "sort"=20
>         command with the same locale, do you get the same order?
>=20=09=09=20=20=20=20
>=20
>=20=09
>     That looks like the typical swedish locale, which sorts=20
> V and W as the
>     same character. Yes, that can be very annoying for some=20
> of us :-), but
>     that's the way it's defined.
>=20=09
>     //Magnus
>=20=09=20=20
>=20
> I would say that sorting V and W as the same character may=20
> work in a dictionary or sometimes when sorting names, however=20
> in that case why don't it sort C and K as the same character?=20
> and in some cases C and S as the same.=20
>=20
> The best sorting algorithm should sort all characters as=20
> separate characters, and if you want fancy sorting you should=20
> need to turn it on for a table.
>=20
> Fancy sorting, like sorting Carlsson and Karlsson together=20
> since they are equal in the same way as=20
> Viktor=3DWiktor=3DVictor=3DWictor. Cesar and Sesar would also have=20
> to be sorted together since they are pronounced the same in=20
> swedish, and if you look in a phonebook you would find them=20
> at the same place.
>=20
> Now I get the english words was and vas sorted together when=20
> I need them separated.
>=20
> Currently the sortingfunction in postgreSQL have disqualified=20
> itself from beeing used, and I do the sorting in java which=20
> treat them as separate caracters.

Well, PostgreSQL uses the locale functionatlity provided by your OS, so you=
 will need to talk to them. Perhaps they already ship a different locale de=
finition that is more suitable for your needs that you can change to?=20

If you select locale=3DC you will get the sort all chars as separate charac=
ters. It will break your sorting of =E5=E4=F6, because they are not in the =
correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly=
 do that. It's a matter of picking the correct locale when you initdb your =
database.

//Magnus

pgsql-bugs by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: BUG #2317: Wrong sorting order for (VW)
Next
From: Andreas Pflug
Date:
Subject: Re: BUG #2318: language