Thread: BUG #1082: Order by doesn't sort correctly.
The following bug has been logged online: Bug reference: 1082 Logged by: Richard Neill Email address: rn214@cam.ac.uk PostgreSQL version: 7.3.4 Operating system: Linux Description: Order by doesn't sort correctly. Details: ORDER BY sorts the following in this order: Cymbal #1 Cymbal - 18 inch Cymbal #2 It ought to be thus: Cymbal #1 Cymbal #2 Cymbal - 18 inch or possibly thus: Cymbal - 18 inch Cymbal #1 Cymbal #2 ------------------------------------------------- Here's an example sql script to reproduce the bug. CREATE TABLE tbl_testinstruments( instrumentid integer, instrument character varying(300) ); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (1, 'Antique Cymbals #1'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (2, 'Antique Cymbals #2'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (3, 'Clash Cymbals, French - 20 inch'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (4, 'Cymbal #1'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (5, 'Cymbal #2'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (6, 'Cymbal - 18 inch'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (7, 'Cymbal, Sizzle'); INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (8, 'Cymbal, Splash'); SELECT instrument FROM tbl_testinstruments ORDER BY instrument; ------------------------------------------------ This is the output I get: instrument --------------------------------- Antique Cymbals #1 Antique Cymbals #2 Clash Cymbals, French - 20 inch Cymbal #1 Cymbal - 18 inch Cymbal #2 Cymbal, Sizzle Cymbal, Splash (8 rows) ------------------------------------------------- I'm using version: PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk)
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes: > Description: Order by doesn't sort correctly. It almost certainly is the correct sort order according to the locale you're using. Use pg_controldata to check the database locale. You'll probably want to re-initdb in C locale. Most non-C locales have weird rules that try to approximate dictionary sort order. regards, tom lane
Richard Neill <rn214@hermes.cam.ac.uk> writes: > This ordering is perverse! No kidding. > No matter what the priority is of the > different characters, I cannot understand how the above can arise. You are assuming that it's a byte-at-a-time process. It's not. I believe the first pass considers only letters and digits. You can easily prove to yourself that it's not just Postgres. Here's an example on my Linux laptop: [tgl@g3 tgl]$ cat zzz Cymbal #1 Cymbal - 18 inch Cymbal #2 [tgl@g3 tgl]$ LC_ALL=C sort zzz Cymbal #1 Cymbal #2 Cymbal - 18 inch [tgl@g3 tgl]$ LC_ALL=en_GB sort zzz Cymbal #1 Cymbal - 18 inch Cymbal #2 [tgl@g3 tgl]$ regards, tom lane
Tom Lane wrote: >Richard Neill <rn214@hermes.cam.ac.uk> writes: > > >>This ordering is perverse! >> >> > >No kidding. > > > >>No matter what the priority is of the >>different characters, I cannot understand how the above can arise. >> >> > >You are assuming that it's a byte-at-a-time process. It's not. I >believe the first pass considers only letters and digits. > >You can easily prove to yourself that it's not just Postgres. Here's >an example on my Linux laptop: > >[tgl@g3 tgl]$ cat zzz >Cymbal #1 >Cymbal - 18 inch >Cymbal #2 >[tgl@g3 tgl]$ LC_ALL=C sort zzz >Cymbal #1 >Cymbal #2 >Cymbal - 18 inch >[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz >Cymbal #1 >Cymbal - 18 inch >Cymbal #2 >[tgl@g3 tgl]$ > > I verified this, and it's not GB specific as one might suggest... Same with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense to anybody? Regards, Andreas
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, On Tue, 24 Feb 2004, Andreas Pflug wrote: <snip> > I verified this, and it's not GB specific as one might suggest... Same > with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense > to anybody? Ummm, same with tr_TR (for a long time). We digged glibc and found out that tr_TR locale is broken there. Trying to fix it. Maybe in the next release... Regards, - -- Devrim GUNDUZ devrim@gunduz.org devrim.gunduz@linux.org.tr http://www.TDMSoft.com http://www.gunduz.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFAO1Dctl86P3SPfQ4RAotmAJwJTUGRtYezYQDByGJ4f04+bJv3wgCgpz08 TxjQCMoYTYogCBSPYkjppLs= =YyRq -----END PGP SIGNATURE-----
Andreas Pflug <pgadmin@pse-consulting.de> writes: > I verified this, and it's not GB specific as one might suggest... Same > with en_US, de_DE, fr_FR, af_ZA. Yeah, most of the locales use dictionary ordering rules. > Does this behaviour really make sense to anybody? You'd have to argue about it with the people who work on locales. AFAIK it is a standard of some kind. regards, tom lane
Dear Tom, Thank you for your explanation. It's very helpful, although I was extremely surprised! I agree, it's not a postgresql bug. Can I suggest it might be worth a mention on the "Order By" part of the documentation. i.e. this page: http://www.postgresql.org/docs/7.3/static/sql-select.html#SQL-ORDERBY could possibly use a little more emphasis of this last paragraph: Data of character types is sorted according to the locale-specific collation order that was established when the database cluster was initialized. or perhaps a link to here: http://www.postgresql.org/docs/7.3/static/charset.html#AEN21582 I did realise that the sort would be locale dependent, but failed to realise it wasn't byte-at-a-time. Best wishes Richard Tom Lane wrote: > Richard Neill <rn214@hermes.cam.ac.uk> writes: > >>This ordering is perverse! > > > No kidding. > > >>No matter what the priority is of the >>different characters, I cannot understand how the above can arise. > > > You are assuming that it's a byte-at-a-time process. It's not. I > believe the first pass considers only letters and digits. > > You can easily prove to yourself that it's not just Postgres. Here's > an example on my Linux laptop: > > [tgl@g3 tgl]$ cat zzz > Cymbal #1 > Cymbal - 18 inch > Cymbal #2 > [tgl@g3 tgl]$ LC_ALL=C sort zzz > Cymbal #1 > Cymbal #2 > Cymbal - 18 inch > [tgl@g3 tgl]$ LC_ALL=en_GB sort zzz > Cymbal #1 > Cymbal - 18 inch > Cymbal #2 > [tgl@g3 tgl]$ > > regards, tom lane > -- rn214@hermes.cam.ac.uk ** http://www.richardneill.org Richard Neill, Trinity College, Cambridge, CB21TQ, U.K.
Tom Lane wrote: > "PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes: > >>Description: Order by doesn't sort correctly. > > > It almost certainly is the correct sort order according to the locale > you're using. Use pg_controldata to check the database locale. You'll > probably want to re-initdb in C locale. Most non-C locales have weird > rules that try to approximate dictionary sort order. > > regards, tom lane > Dear Tom, Thanks for your email. I did check pg_controldata and found: LC_COLLATE: en_GB LC_CTYPE: en_GB The bug isn't the particular ascii-betical (or other) order. But what I am getting as a supposedly ordered list includes: Cymbal #1 Cymbal - 18 inch Cymbal #2 This ordering is perverse! No matter what the priority is of the different characters, I cannot understand how the above can arise. Whether '#' comes before or after '-', '#1' and '#2' should be adjacent. Richard