Re: BUG #2317: Wrong sorting order for (VW) - Mailing list pgsql-bugs
From | Magnus Hagander |
---|---|
Subject | Re: BUG #2317: Wrong sorting order for (VW) |
Date | |
Msg-id | 6BCB9D8A16AC4241919521715F4D8BCEA35143@algol.sollentuna.se Whole thread Raw |
In response to | BUG #2317: Wrong sorting order for (VW) ("TomasKlockar" <tomask@omicron.se>) |
List | pgsql-bugs |
> The following bug has been logged online: >=20=09=09=09 > Bug reference: 2317 > Logged by: TomasKlockar > Email address: tomask@omicron.se > PostgreSQL version: 7.3.2/7.4.7 > Operating system: linux(fedora) > Description: Wrong sorting order for (VW) > Details: >=20=09=09=09 > SELECT cname FROM clients ORDER BY cname; >=20=09=09=09 > sorts W before V and that is a mistake. >=20=09=09=09 > the result is >=20=09=09=09 > V > W > WHI > Vi > Wi > Volvo >=20=09=09=09 > Correct order would be to place all vV=20 > before all wW. >=20=09=09=09 > I think the database was initialized=20 > with UTF8 but it might=20 >=20=09=09=09=20=20=20=20=20=20 >=20 > have been=20 >=20=09=09=20=20=20=20 >=20 > ISO_8859-1, however the error is easily=20 > repeatable and ill=20 >=20=09=09=09=20=20=20=20=20=20 >=20 > be happy to=20 >=20=09=09=20=20=20=20 >=20 > provide a dump. >=20=09=09=09 > I tested this on 7.3.2 and 7.4.7 and=20 > the error was repeatable. >=20=09=09=09=20=20=20=20=20=20 >=20 > I think the most important thing we'd need to=20 > know is what=20 > locale the database was initialized with in=20 > order to try to reproduce. >=20=09=09 > If you put similar data in a file and use the=20 > unix "sort"=20 > command with the same locale, do you get the same order? >=20=09=09=20=20=20=20 >=20 >=20=09 > That looks like the typical swedish locale, which sorts=20 > V and W as the > same character. Yes, that can be very annoying for some=20 > of us :-), but > that's the way it's defined. >=20=09 > //Magnus >=20=09=20=20 >=20 > I would say that sorting V and W as the same character may=20 > work in a dictionary or sometimes when sorting names, however=20 > in that case why don't it sort C and K as the same character?=20 > and in some cases C and S as the same.=20 >=20 > The best sorting algorithm should sort all characters as=20 > separate characters, and if you want fancy sorting you should=20 > need to turn it on for a table. >=20 > Fancy sorting, like sorting Carlsson and Karlsson together=20 > since they are equal in the same way as=20 > Viktor=3DWiktor=3DVictor=3DWictor. Cesar and Sesar would also have=20 > to be sorted together since they are pronounced the same in=20 > swedish, and if you look in a phonebook you would find them=20 > at the same place. >=20 > Now I get the english words was and vas sorted together when=20 > I need them separated. >=20 > Currently the sortingfunction in postgreSQL have disqualified=20 > itself from beeing used, and I do the sorting in java which=20 > treat them as separate caracters. Well, PostgreSQL uses the locale functionatlity provided by your OS, so you= will need to talk to them. Perhaps they already ship a different locale de= finition that is more suitable for your needs that you can change to?=20 If you select locale=3DC you will get the sort all chars as separate charac= ters. It will break your sorting of =E5=E4=F6, because they are not in the = correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly= do that. It's a matter of picking the correct locale when you initdb your = database. //Magnus
pgsql-bugs by date: