Thread: BUG #2317: Wrong sorting order for (VW)
The following bug has been logged online: Bug reference: 2317 Logged by: TomasKlockar Email address: tomask@omicron.se PostgreSQL version: 7.3.2/7.4.7 Operating system: linux(fedora) Description: Wrong sorting order for (VW) Details: SELECT cname FROM clients ORDER BY cname; sorts W before V and that is a mistake. the result is V W WHI Vi Wi Volvo Correct order would be to place all vV before all wW. I think the database was initialized with UTF8 but it might have been ISO_8859-1, however the error is easily repeatable and ill be happy to provide a dump. I tested this on 7.3.2 and 7.4.7 and the error was repeatable. regards, /Tomas
On Mon, 13 Mar 2006, TomasKlockar wrote: > > The following bug has been logged online: > > Bug reference: 2317 > Logged by: TomasKlockar > Email address: tomask@omicron.se > PostgreSQL version: 7.3.2/7.4.7 > Operating system: linux(fedora) > Description: Wrong sorting order for (VW) > Details: > > SELECT cname FROM clients ORDER BY cname; > > sorts W before V and that is a mistake. > > the result is > > V > W > WHI > Vi > Wi > Volvo > > Correct order would be to place all vV before all wW. > > I think the database was initialized with UTF8 but it might have been > ISO_8859-1, however the error is easily repeatable and ill be happy to > provide a dump. > > I tested this on 7.3.2 and 7.4.7 and the error was repeatable. I think the most important thing we'd need to know is what locale the database was initialized with in order to try to reproduce. If you put similar data in a file and use the unix "sort" command with the same locale, do you get the same order?
"TomasKlockar" <tomask@omicron.se> writes: > I think the database was initialized with UTF8 but it might have been > ISO_8859-1, however the error is easily repeatable and ill be happy to > provide a dump. If the database encoding doesn't match the database locale (server LC_COLLATE setting), all kinds of very strange sorting behavior can ensue. I suspect you've been bit by this gotcha. regards, tom lane
> > The following bug has been logged online: > > > > Bug reference: 2317 > > Logged by: TomasKlockar > > Email address: tomask@omicron.se > > PostgreSQL version: 7.3.2/7.4.7 > > Operating system: linux(fedora) > > Description: Wrong sorting order for (VW) > > Details: > > > > SELECT cname FROM clients ORDER BY cname; > > > > sorts W before V and that is a mistake. > > > > the result is > > > > V > > W > > WHI > > Vi > > Wi > > Volvo > > > > Correct order would be to place all vV before all wW. > > > > I think the database was initialized with UTF8 but it might=20 > have been=20 > > ISO_8859-1, however the error is easily repeatable and ill=20 > be happy to=20 > > provide a dump. > > > > I tested this on 7.3.2 and 7.4.7 and the error was repeatable. >=20 > I think the most important thing we'd need to know is what=20 > locale the database was initialized with in order to try to reproduce. >=20 > If you put similar data in a file and use the unix "sort"=20 > command with the same locale, do you get the same order? That looks like the typical swedish locale, which sorts V and W as the same character. Yes, that can be very annoying for some of us :-), but that's the way it's defined. //Magnus
> The following bug has been logged online: >=20=09=09=09 > Bug reference: 2317 > Logged by: TomasKlockar > Email address: tomask@omicron.se > PostgreSQL version: 7.3.2/7.4.7 > Operating system: linux(fedora) > Description: Wrong sorting order for (VW) > Details: >=20=09=09=09 > SELECT cname FROM clients ORDER BY cname; >=20=09=09=09 > sorts W before V and that is a mistake. >=20=09=09=09 > the result is >=20=09=09=09 > V > W > WHI > Vi > Wi > Volvo >=20=09=09=09 > Correct order would be to place all vV=20 > before all wW. >=20=09=09=09 > I think the database was initialized=20 > with UTF8 but it might=20 >=20=09=09=09=20=20=20=20=20=20 >=20 > have been=20 >=20=09=09=20=20=20=20 >=20 > ISO_8859-1, however the error is easily=20 > repeatable and ill=20 >=20=09=09=09=20=20=20=20=20=20 >=20 > be happy to=20 >=20=09=09=20=20=20=20 >=20 > provide a dump. >=20=09=09=09 > I tested this on 7.3.2 and 7.4.7 and=20 > the error was repeatable. >=20=09=09=09=20=20=20=20=20=20 >=20 > I think the most important thing we'd need to=20 > know is what=20 > locale the database was initialized with in=20 > order to try to reproduce. >=20=09=09 > If you put similar data in a file and use the=20 > unix "sort"=20 > command with the same locale, do you get the same order? >=20=09=09=20=20=20=20 >=20 >=20=09 > That looks like the typical swedish locale, which sorts=20 > V and W as the > same character. Yes, that can be very annoying for some=20 > of us :-), but > that's the way it's defined. >=20=09 > //Magnus >=20=09=20=20 >=20 > I would say that sorting V and W as the same character may=20 > work in a dictionary or sometimes when sorting names, however=20 > in that case why don't it sort C and K as the same character?=20 > and in some cases C and S as the same.=20 >=20 > The best sorting algorithm should sort all characters as=20 > separate characters, and if you want fancy sorting you should=20 > need to turn it on for a table. >=20 > Fancy sorting, like sorting Carlsson and Karlsson together=20 > since they are equal in the same way as=20 > Viktor=3DWiktor=3DVictor=3DWictor. Cesar and Sesar would also have=20 > to be sorted together since they are pronounced the same in=20 > swedish, and if you look in a phonebook you would find them=20 > at the same place. >=20 > Now I get the english words was and vas sorted together when=20 > I need them separated. >=20 > Currently the sortingfunction in postgreSQL have disqualified=20 > itself from beeing used, and I do the sorting in java which=20 > treat them as separate caracters. Well, PostgreSQL uses the locale functionatlity provided by your OS, so you= will need to talk to them. Perhaps they already ship a different locale de= finition that is more suitable for your needs that you can change to?=20 If you select locale=3DC you will get the sort all chars as separate charac= ters. It will break your sorting of =E5=E4=F6, because they are not in the = correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly= do that. It's a matter of picking the correct locale when you initdb your = database. //Magnus
Magnus Hagander wrote: >>>The following bug has been logged online: >>> >>>Bug reference: 2317 >>>Logged by: TomasKlockar >>>Email address: tomask@omicron.se >>>PostgreSQL version: 7.3.2/7.4.7 >>>Operating system: linux(fedora) >>>Description: Wrong sorting order for (VW) >>>Details: >>> >>>SELECT cname FROM clients ORDER BY cname; >>> >>>sorts W before V and that is a mistake. >>> >>>the result is >>> >>>V >>>W >>>WHI >>>Vi >>>Wi >>>Volvo >>> >>>Correct order would be to place all vV before all wW. >>> >>>I think the database was initialized with UTF8 but it might >>> >>> >>have been >> >> >>>ISO_8859-1, however the error is easily repeatable and ill >>> >>> >>be happy to >> >> >>>provide a dump. >>> >>>I tested this on 7.3.2 and 7.4.7 and the error was repeatable. >>> >>> >>I think the most important thing we'd need to know is what >>locale the database was initialized with in order to try to reproduce. >> >>If you put similar data in a file and use the unix "sort" >>command with the same locale, do you get the same order? >> >> > >That looks like the typical swedish locale, which sorts V and W as the >same character. Yes, that can be very annoying for some of us :-), but >that's the way it's defined. > >//Magnus > > I would say that sorting V and W as the same character may work in a dictionary or sometimes when sorting names, however in that case why don't it sort C and K as the same character? and in some cases C and S as the same. The best sorting algorithm should sort all characters as separate characters, and if you want fancy sorting you should need to turn it on for a table. Fancy sorting, like sorting Carlsson and Karlsson together since they are equal in the same way as Viktor=Wiktor=Victor=Wictor. Cesar and Sesar would also have to be sorted together since they are pronounced the same in swedish, and if you look in a phonebook you would find them at the same place. Now I get the english words was and vas sorted together when I need them separated. Currently the sortingfunction in postgreSQL have disqualified itself from beeing used, and I do the sorting in java which treat them as separate caracters. This is my 5 cent, /Tomas
On Thu, Mar 16, 2006 at 11:03:59AM +0100, Tomas Klockar wrote: [...] > I would say that sorting V and W as the same character may work in a=20 > dictionary [...] > Currently the sortingfunction in postgreSQL have disqualified itself=20 > from beeing used, and I do the sorting in java which treat them as=20 > separate caracters. >=20 > This is my 5 cent, Tomas, as it has been said, PostgreSQL uses the operating system's locale. So you can do two things: change it (e.b. to C , which is neutral) or fix it. If you are on an unix-like OS, man 5 locale might help with that. Regards -- tom=C3=A1s