Thread: Order By ignoring "-" character
I'm having a problem where the ORDER BY clause is not performing as I would expect. I created a small set of test data: create table orderbytest (name varchar(20)); insert into orderbytest values ('a'); insert into orderbytest values ('- b'); insert into orderbytest values ('c'); insert into orderbytest values ('- d'); insert into orderbytest values ('e'); insert into orderbytest values ('- F'); insert into orderbytest values ('-- g'); insert into orderbytest values ('-- h'); insert into orderbytest values ('I'); insert into orderbytest values ('j'); select * from orderbytest order by name; Running this in oracle, I get the following results, which are the results that I would expect: NAME ----- - F - b - d -- g -- h I a c e j Running the same in Postgresql 7.3 I get name ------ a - b c - d e - F -- g -- h I j Postgres not only seems to ignore the case of the text, implying a clause of "order by upper(name)", but it also seems to drop characters when doing the sort (the "-" and " " preceding the letters are not used during the order by). Doing some digging on this subject seemed to imply that my locale was not set correctly somehow. This is an out-of-the-box install of postgres from RedHat 9 in the en_US locale. I have confirmed this behavior on another out-of-the-box install on RedHat 8. If this isn't considered a bug, then could somebody please enlighten me as to the proper use of ORDER BY? In other words, how do I get postgres to order by using a sort typical of every text sort I've ever seen? Once that takes case into consideration and doesn't pick and choose which characters it will use for the sort. Thanks, -- Dave Wood
Dave Wood writes: > If this isn't considered a bug, then could somebody please enlighten me > as to the proper use of ORDER BY? Use a different locale. > In other words, how do I get postgres to order by using a sort typical > of every text sort I've ever seen? Once that takes case into > consideration and doesn't pick and choose which characters it will use > for the sort. I don't know what sort orders you've seen, but this behavior conforms to international and national sorting standards. For example, my dictionary states, "Entries are listed in alphabetical order without taking into account spaces or hyphens." Anyway, you should find that all characters and their properties are taken into account, just not with the same priority. Typically, a collation has four passes (speaking simplified): letter, accent, case, special characters. -- Peter Eisentraut peter_e@gmx.net
On Thu, 5 Jun 2003, Dave Wood wrote: > Postgres not only seems to ignore the case of the text, implying a > clause of "order by upper(name)", but it also seems to drop characters > when doing the sort (the "-" and " " preceding the letters are not used > during the order by). Doing some digging on this subject seemed to > imply that my locale was not set correctly somehow. This is an > out-of-the-box install of postgres from RedHat 9 in the en_US locale. I > have confirmed this behavior on another out-of-the-box install on RedHat > 8. > > If this isn't considered a bug, then could somebody please enlighten me > as to the proper use of ORDER BY? In other words, how do I get postgres > to order by using a sort typical of every text sort I've ever seen? > Once that takes case into consideration and doesn't pick and choose > which characters it will use for the sort. It's not a bug. We sort using the rules for the locale you chose at initdb time. The definition of en_US's sort rules on your system ignore case, spaces and some if not all symbols (I also use RedHat 9). To watch this, make sure you're in en_US and use the sort utility on arbitrary data. If you want more typical computer sort rules, you'll want to initdb with the "C" locale (I generally do something like LANG="C" initdb -D ... )
On Thu, Jun 05, 2003 at 11:43:08 -0400, Dave Wood <postgresql_question@attbi.com> wrote: > I'm having a problem where the ORDER BY clause is not performing as I > would expect. I created a small set of test data: > > If this isn't considered a bug, then could somebody please enlighten me > as to the proper use of ORDER BY? In other words, how do I get postgres > to order by using a sort typical of every text sort I've ever seen? > Once that takes case into consideration and doesn't pick and choose > which characters it will use for the sort. It is a locale issue. Order by sorts by the locale ordering. I believe the fix for this thi requires an initdb, specifying C for the locale.
Dave Wood <postgresql_question@attbi.com> writes: > Postgres not only seems to ignore the case of the text, implying a > clause of "order by upper(name)", but it also seems to drop characters > when doing the sort (the "-" and " " preceding the letters are not used > during the order by). Doing some digging on this subject seemed to > imply that my locale was not set correctly somehow. This is an > out-of-the-box install of postgres from RedHat 9 in the en_US locale. ^^^^^^^^^^^^ You need locale "C" if you want plain-ASCII sort order. Unfortunately, changing this requires a re-initdb (because it affects index ordering). There are ongoing discussions about whether or not we should force C as the default locale rather than accept the default from your environment. It seems that some people will be confused and unhappy no matter which way we do it :-( regards, tom lane