Thread: BUG #2317: Wrong sorting order for (VW)

BUG #2317: Wrong sorting order for (VW)

From
"TomasKlockar"
Date:
The following bug has been logged online:

Bug reference:      2317
Logged by:          TomasKlockar
Email address:      tomask@omicron.se
PostgreSQL version: 7.3.2/7.4.7
Operating system:   linux(fedora)
Description:        Wrong sorting order for (VW)
Details:

SELECT cname FROM clients ORDER BY cname;

sorts W before V and that is a mistake.

the result is

V
W
WHI
Vi
Wi
Volvo

Correct order would be to place all vV before all wW.

I think the database was initialized with UTF8 but it might have been
ISO_8859-1, however the error is easily repeatable and ill be happy to
provide a dump.

I tested this on 7.3.2 and 7.4.7 and the error was repeatable.

regards,

/Tomas

Re: BUG #2317: Wrong sorting order for (VW)

From
Stephan Szabo
Date:
On Mon, 13 Mar 2006, TomasKlockar wrote:

>
> The following bug has been logged online:
>
> Bug reference:      2317
> Logged by:          TomasKlockar
> Email address:      tomask@omicron.se
> PostgreSQL version: 7.3.2/7.4.7
> Operating system:   linux(fedora)
> Description:        Wrong sorting order for (VW)
> Details:
>
> SELECT cname FROM clients ORDER BY cname;
>
> sorts W before V and that is a mistake.
>
> the result is
>
> V
> W
> WHI
> Vi
> Wi
> Volvo
>
> Correct order would be to place all vV before all wW.
>
> I think the database was initialized with UTF8 but it might have been
> ISO_8859-1, however the error is easily repeatable and ill be happy to
> provide a dump.
>
> I tested this on 7.3.2 and 7.4.7 and the error was repeatable.

I think the most important thing we'd need to know is what locale the
database was initialized with in order to try to reproduce.

If you put similar data in a file and use the unix "sort" command with the
same locale, do you get the same order?

Re: BUG #2317: Wrong sorting order for (VW)

From
Tom Lane
Date:
"TomasKlockar" <tomask@omicron.se> writes:
> I think the database was initialized with UTF8 but it might have been
> ISO_8859-1, however the error is easily repeatable and ill be happy to
> provide a dump.

If the database encoding doesn't match the database locale (server
LC_COLLATE setting), all kinds of very strange sorting behavior can
ensue.  I suspect you've been bit by this gotcha.

            regards, tom lane

Re: BUG #2317: Wrong sorting order for (VW)

From
"Magnus Hagander"
Date:
> > The following bug has been logged online:
> >
> > Bug reference:      2317
> > Logged by:          TomasKlockar
> > Email address:      tomask@omicron.se
> > PostgreSQL version: 7.3.2/7.4.7
> > Operating system:   linux(fedora)
> > Description:        Wrong sorting order for (VW)
> > Details:
> >
> > SELECT cname FROM clients ORDER BY cname;
> >
> > sorts W before V and that is a mistake.
> >
> > the result is
> >
> > V
> > W
> > WHI
> > Vi
> > Wi
> > Volvo
> >
> > Correct order would be to place all vV before all wW.
> >
> > I think the database was initialized with UTF8 but it might=20
> have been=20
> > ISO_8859-1, however the error is easily repeatable and ill=20
> be happy to=20
> > provide a dump.
> >
> > I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
>=20
> I think the most important thing we'd need to know is what=20
> locale the database was initialized with in order to try to reproduce.
>=20
> If you put similar data in a file and use the unix "sort"=20
> command with the same locale, do you get the same order?

That looks like the typical swedish locale, which sorts V and W as the
same character. Yes, that can be very annoying for some of us :-), but
that's the way it's defined.

//Magnus

Re: BUG #2317: Wrong sorting order for (VW)

From
"Magnus Hagander"
Date:
>             The following bug has been logged online:
>=20=09=09=09
>             Bug reference:      2317
>             Logged by:          TomasKlockar
>             Email address:      tomask@omicron.se
>             PostgreSQL version: 7.3.2/7.4.7
>             Operating system:   linux(fedora)
>             Description:        Wrong sorting order for (VW)
>             Details:
>=20=09=09=09
>             SELECT cname FROM clients ORDER BY cname;
>=20=09=09=09
>             sorts W before V and that is a mistake.
>=20=09=09=09
>             the result is
>=20=09=09=09
>             V
>             W
>             WHI
>             Vi
>             Wi
>             Volvo
>=20=09=09=09
>             Correct order would be to place all vV=20
> before all wW.
>=20=09=09=09
>             I think the database was initialized=20
> with UTF8 but it might=20
>=20=09=09=09=20=20=20=20=20=20
>=20
>         have been=20
>=20=09=09=20=20=20=20
>=20
>             ISO_8859-1, however the error is easily=20
> repeatable and ill=20
>=20=09=09=09=20=20=20=20=20=20
>=20
>         be happy to=20
>=20=09=09=20=20=20=20
>=20
>             provide a dump.
>=20=09=09=09
>             I tested this on 7.3.2 and 7.4.7 and=20
> the error was repeatable.
>=20=09=09=09=20=20=20=20=20=20
>=20
>         I think the most important thing we'd need to=20
> know is what=20
>         locale the database was initialized with in=20
> order to try to reproduce.
>=20=09=09
>         If you put similar data in a file and use the=20
> unix "sort"=20
>         command with the same locale, do you get the same order?
>=20=09=09=20=20=20=20
>=20
>=20=09
>     That looks like the typical swedish locale, which sorts=20
> V and W as the
>     same character. Yes, that can be very annoying for some=20
> of us :-), but
>     that's the way it's defined.
>=20=09
>     //Magnus
>=20=09=20=20
>=20
> I would say that sorting V and W as the same character may=20
> work in a dictionary or sometimes when sorting names, however=20
> in that case why don't it sort C and K as the same character?=20
> and in some cases C and S as the same.=20
>=20
> The best sorting algorithm should sort all characters as=20
> separate characters, and if you want fancy sorting you should=20
> need to turn it on for a table.
>=20
> Fancy sorting, like sorting Carlsson and Karlsson together=20
> since they are equal in the same way as=20
> Viktor=3DWiktor=3DVictor=3DWictor. Cesar and Sesar would also have=20
> to be sorted together since they are pronounced the same in=20
> swedish, and if you look in a phonebook you would find them=20
> at the same place.
>=20
> Now I get the english words was and vas sorted together when=20
> I need them separated.
>=20
> Currently the sortingfunction in postgreSQL have disqualified=20
> itself from beeing used, and I do the sorting in java which=20
> treat them as separate caracters.

Well, PostgreSQL uses the locale functionatlity provided by your OS, so you=
 will need to talk to them. Perhaps they already ship a different locale de=
finition that is more suitable for your needs that you can change to?=20

If you select locale=3DC you will get the sort all chars as separate charac=
ters. It will break your sorting of =E5=E4=F6, because they are not in the =
correct sequence in LATIN1 (or UTF8 for that matter), but you can certainly=
 do that. It's a matter of picking the correct locale when you initdb your =
database.

//Magnus

Re: BUG #2317: Wrong sorting order for (VW)

From
Tomas Klockar
Date:
Magnus Hagander wrote:

>>>The following bug has been logged online:
>>>
>>>Bug reference:      2317
>>>Logged by:          TomasKlockar
>>>Email address:      tomask@omicron.se
>>>PostgreSQL version: 7.3.2/7.4.7
>>>Operating system:   linux(fedora)
>>>Description:        Wrong sorting order for (VW)
>>>Details:
>>>
>>>SELECT cname FROM clients ORDER BY cname;
>>>
>>>sorts W before V and that is a mistake.
>>>
>>>the result is
>>>
>>>V
>>>W
>>>WHI
>>>Vi
>>>Wi
>>>Volvo
>>>
>>>Correct order would be to place all vV before all wW.
>>>
>>>I think the database was initialized with UTF8 but it might
>>>
>>>
>>have been
>>
>>
>>>ISO_8859-1, however the error is easily repeatable and ill
>>>
>>>
>>be happy to
>>
>>
>>>provide a dump.
>>>
>>>I tested this on 7.3.2 and 7.4.7 and the error was repeatable.
>>>
>>>
>>I think the most important thing we'd need to know is what
>>locale the database was initialized with in order to try to reproduce.
>>
>>If you put similar data in a file and use the unix "sort"
>>command with the same locale, do you get the same order?
>>
>>
>
>That looks like the typical swedish locale, which sorts V and W as the
>same character. Yes, that can be very annoying for some of us :-), but
>that's the way it's defined.
>
>//Magnus
>
>
I would say that sorting V and W as the same character may work in a
dictionary or sometimes when sorting names, however in that case why
don't it sort C and K as the same character? and in some cases C and S
as the same.

The best sorting algorithm should sort all characters as separate
characters, and if you want fancy sorting you should need to turn it on
for a table.

Fancy sorting, like sorting Carlsson and Karlsson together since they
are equal in the same way as Viktor=Wiktor=Victor=Wictor. Cesar and
Sesar would also have to be sorted together since they are pronounced
the same in swedish, and if you look in a phonebook you would find them
at the same place.

Now I get the english words was and vas sorted together when I need them
separated.

Currently the sortingfunction in postgreSQL have disqualified itself
from beeing used, and I do the sorting in java which treat them as
separate caracters.

This is my 5 cent,

/Tomas

Re: BUG #2317: Wrong sorting order for (VW)

From
tomas@tuxteam.de (Tomas Zerolo)
Date:
On Thu, Mar 16, 2006 at 11:03:59AM +0100, Tomas Klockar wrote:
[...]
> I would say that sorting V and W as the same character may work in a=20
> dictionary [...]

> Currently the sortingfunction in postgreSQL have disqualified itself=20
> from beeing used, and I do the sorting in java which treat them as=20
> separate caracters.
>=20
> This is my 5 cent,

Tomas, as it has been said, PostgreSQL uses the operating system's
locale. So you can do two things: change it (e.b. to C , which is
neutral) or fix it. If you are on an unix-like OS, man 5 locale might
help with that.

Regards
-- tom=C3=A1s