Thread: BUG #1082: Order by doesn't sort correctly.

BUG #1082: Order by doesn't sort correctly.

From
"PostgreSQL Bugs List"
Date:
The following bug has been logged online:

Bug reference:      1082
Logged by:          Richard Neill

Email address:      rn214@cam.ac.uk

PostgreSQL version: 7.3.4

Operating system:   Linux

Description:        Order by doesn't sort correctly.

Details:

ORDER BY sorts the following in this order:

Cymbal #1
Cymbal - 18 inch
Cymbal #2

It ought to be thus:

Cymbal #1
Cymbal #2
Cymbal - 18 inch

or possibly thus:

Cymbal - 18 inch
Cymbal #1
Cymbal #2


-------------------------------------------------
Here's an example sql script to reproduce the bug.


CREATE TABLE tbl_testinstruments(
    instrumentid    integer,                    instrument     character varying(300)
);

INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (1,
'Antique Cymbals #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (2,
'Antique Cymbals #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (3, 'Clash
Cymbals, French - 20 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (4,
'Cymbal #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (5,
'Cymbal #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (6,
'Cymbal - 18 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (7,
'Cymbal, Sizzle');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (8,
'Cymbal, Splash');

SELECT instrument FROM tbl_testinstruments ORDER BY instrument;


------------------------------------------------
This is the output I get:

        instrument
---------------------------------
 Antique Cymbals #1
 Antique Cymbals #2
 Clash Cymbals, French - 20 inch
 Cymbal #1
 Cymbal - 18 inch
 Cymbal #2
 Cymbal, Sizzle
 Cymbal, Splash
(8 rows)




-------------------------------------------------
I'm using version:

 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1
(Mandrake Linux 9.2 3.3.1-1mdk)

Re: BUG #1082: Order by doesn't sort correctly.

From
Tom Lane
Date:
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> Description:        Order by doesn't sort correctly.

It almost certainly is the correct sort order according to the locale
you're using.  Use pg_controldata to check the database locale.  You'll
probably want to re-initdb in C locale.  Most non-C locales have weird
rules that try to approximate dictionary sort order.

            regards, tom lane

Re: BUG #1082: Order by doesn't sort correctly.

From
Tom Lane
Date:
Richard Neill <rn214@hermes.cam.ac.uk> writes:
> This ordering is perverse!

No kidding.

> No matter what the priority is of the
> different characters, I cannot understand how the above can arise.

You are assuming that it's a byte-at-a-time process.  It's not.  I
believe the first pass considers only letters and digits.

You can easily prove to yourself that it's not just Postgres.  Here's
an example on my Linux laptop:

[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$

            regards, tom lane

Re: BUG #1082: Order by doesn't sort correctly.

From
Andreas Pflug
Date:
Tom Lane wrote:

>Richard Neill <rn214@hermes.cam.ac.uk> writes:
>
>
>>This ordering is perverse!
>>
>>
>
>No kidding.
>
>
>
>>No matter what the priority is of the
>>different characters, I cannot understand how the above can arise.
>>
>>
>
>You are assuming that it's a byte-at-a-time process.  It's not.  I
>believe the first pass considers only letters and digits.
>
>You can easily prove to yourself that it's not just Postgres.  Here's
>an example on my Linux laptop:
>
>[tgl@g3 tgl]$ cat zzz
>Cymbal #1
>Cymbal - 18 inch
>Cymbal #2
>[tgl@g3 tgl]$ LC_ALL=C sort zzz
>Cymbal #1
>Cymbal #2
>Cymbal - 18 inch
>[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
>Cymbal #1
>Cymbal - 18 inch
>Cymbal #2
>[tgl@g3 tgl]$
>
>
I verified this, and it's not GB specific as one might suggest... Same
with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
to anybody?

Regards,
Andreas

Re: BUG #1082: Order by doesn't sort correctly.

From
Devrim GUNDUZ
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi,

On Tue, 24 Feb 2004, Andreas Pflug wrote:

<snip>
> I verified this, and it's not GB specific as one might suggest... Same
> with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
> to anybody?

Ummm, same with tr_TR (for a long time).

We digged glibc and found out that tr_TR locale is broken there. Trying to
fix it. Maybe in the next release...

Regards,
- --
Devrim GUNDUZ
devrim@gunduz.org                devrim.gunduz@linux.org.tr
            http://www.TDMSoft.com
            http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFAO1Dctl86P3SPfQ4RAotmAJwJTUGRtYezYQDByGJ4f04+bJv3wgCgpz08
TxjQCMoYTYogCBSPYkjppLs=
=YyRq
-----END PGP SIGNATURE-----

Re: BUG #1082: Order by doesn't sort correctly.

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I verified this, and it's not GB specific as one might suggest... Same
> with en_US, de_DE, fr_FR, af_ZA.

Yeah, most of the locales use dictionary ordering rules.

> Does this behaviour really make sense to anybody?

You'd have to argue about it with the people who work on locales.
AFAIK it is a standard of some kind.

            regards, tom lane

Re: BUG #1082: Order by doesn't sort correctly.

From
Richard Neill
Date:
Dear Tom,

Thank you for your explanation. It's very helpful, although I was
extremely surprised! I agree, it's not a postgresql bug.

Can I suggest it might be worth a mention on the "Order By" part of the
documentation.

i.e. this page:
http://www.postgresql.org/docs/7.3/static/sql-select.html#SQL-ORDERBY

could possibly use a little more emphasis of this last paragraph:
Data of character types is sorted according to the locale-specific
collation order that was established when the database cluster was
initialized.

or perhaps a link to here:
http://www.postgresql.org/docs/7.3/static/charset.html#AEN21582

I did realise that the sort would be locale dependent, but failed to
realise it wasn't byte-at-a-time.

Best wishes

Richard


Tom Lane wrote:
> Richard Neill <rn214@hermes.cam.ac.uk> writes:
>
>>This ordering is perverse!
>
>
> No kidding.
>
>
>>No matter what the priority is of the
>>different characters, I cannot understand how the above can arise.
>
>
> You are assuming that it's a byte-at-a-time process.  It's not.  I
> believe the first pass considers only letters and digits.
>
> You can easily prove to yourself that it's not just Postgres.  Here's
> an example on my Linux laptop:
>
> [tgl@g3 tgl]$ cat zzz
> Cymbal #1
> Cymbal - 18 inch
> Cymbal #2
> [tgl@g3 tgl]$ LC_ALL=C sort zzz
> Cymbal #1
> Cymbal #2
> Cymbal - 18 inch
> [tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
> Cymbal #1
> Cymbal - 18 inch
> Cymbal #2
> [tgl@g3 tgl]$
>
>             regards, tom lane
>

--
rn214@hermes.cam.ac.uk  **  http://www.richardneill.org
Richard Neill, Trinity College, Cambridge, CB21TQ, U.K.

Re: BUG #1082: Order by doesn't sort correctly.

From
Richard Neill
Date:
Tom Lane wrote:
> "PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
>
>>Description:        Order by doesn't sort correctly.
>
>
> It almost certainly is the correct sort order according to the locale
> you're using.  Use pg_controldata to check the database locale.  You'll
> probably want to re-initdb in C locale.  Most non-C locales have weird
> rules that try to approximate dictionary sort order.
>
>             regards, tom lane
>

Dear Tom,

Thanks for your email. I did check pg_controldata and found:
LC_COLLATE:  en_GB
LC_CTYPE:    en_GB

The bug isn't the particular ascii-betical (or other) order.
But what I am getting as a supposedly ordered list includes:

Cymbal #1
Cymbal - 18 inch
Cymbal #2

This ordering is perverse! No matter what the priority is of the
different characters, I cannot understand how the above can arise.
Whether '#' comes before or after '-', '#1' and '#2' should be adjacent.

Richard