Thread: Order By ignoring "-" character

Order By ignoring "-" character

From
Dave Wood
Date:
I'm having a problem where the ORDER BY clause is not performing as I
would expect.  I created a small set of test data:

create table orderbytest (name varchar(20));
insert into orderbytest values ('a');
insert into orderbytest values ('- b');
insert into orderbytest values ('c');
insert into orderbytest values ('- d');
insert into orderbytest values ('e');
insert into orderbytest values ('- F');
insert into orderbytest values ('-- g');
insert into orderbytest values ('-- h');
insert into orderbytest values ('I');
insert into orderbytest values ('j');
select * from orderbytest order by name;


Running this in oracle, I get the following results, which are the
results that I would expect:
NAME
-----
- F
- b
- d
-- g
-- h
I
a
c
e
j


Running the same in Postgresql 7.3 I get
name
------
 a
 - b
 c
 - d
 e
 - F
 -- g
 -- h
 I
 j


Postgres not only seems to ignore the case of the text, implying a
clause of "order by upper(name)", but it also seems to drop characters
when doing the sort (the "-" and " " preceding the letters are not used
during the order by).  Doing some digging on this subject seemed to
imply that my locale was not set correctly somehow.  This is an
out-of-the-box install of postgres from RedHat 9 in the en_US locale.  I
have confirmed this behavior on another out-of-the-box install on RedHat
8.

If this isn't considered a bug, then could somebody please enlighten me
as to the proper use of ORDER BY?  In other words, how do I get postgres
to order by using a sort typical of every text sort I've ever seen?
Once that takes case into consideration and doesn't pick and choose
which characters it will use for the sort.

Thanks,

-- Dave Wood


Re: Order By ignoring "-" character

From
Peter Eisentraut
Date:
Dave Wood writes:

> If this isn't considered a bug, then could somebody please enlighten me
> as to the proper use of ORDER BY?

Use a different locale.

> In other words, how do I get postgres to order by using a sort typical
> of every text sort I've ever seen?  Once that takes case into
> consideration and doesn't pick and choose which characters it will use
> for the sort.

I don't know what sort orders you've seen, but this behavior conforms to
international and national sorting standards.  For example, my dictionary
states, "Entries are listed in alphabetical order without taking into
account spaces or hyphens."  Anyway, you should find that all characters
and their properties are taken into account, just not with the same
priority.  Typically, a collation has four passes (speaking simplified):
letter, accent, case, special characters.

--
Peter Eisentraut   peter_e@gmx.net


Re: Order By ignoring "-" character

From
Stephan Szabo
Date:
On Thu, 5 Jun 2003, Dave Wood wrote:

> Postgres not only seems to ignore the case of the text, implying a
> clause of "order by upper(name)", but it also seems to drop characters
> when doing the sort (the "-" and " " preceding the letters are not used
> during the order by).  Doing some digging on this subject seemed to
> imply that my locale was not set correctly somehow.  This is an
> out-of-the-box install of postgres from RedHat 9 in the en_US locale.  I
> have confirmed this behavior on another out-of-the-box install on RedHat
> 8.
>
> If this isn't considered a bug, then could somebody please enlighten me
> as to the proper use of ORDER BY?  In other words, how do I get postgres
> to order by using a sort typical of every text sort I've ever seen?
> Once that takes case into consideration and doesn't pick and choose
> which characters it will use for the sort.

It's not a bug. We sort using the rules for the locale you chose at initdb
time. The definition of en_US's sort rules on your system ignore case,
spaces and some if not all symbols (I also use RedHat 9). To watch this,
make sure you're in en_US and use the sort utility on arbitrary data.

If you want more typical computer sort rules, you'll want to initdb with
the "C" locale (I generally do something like LANG="C" initdb -D ... )


Re: Order By ignoring "-" character

From
Bruno Wolff III
Date:
On Thu, Jun 05, 2003 at 11:43:08 -0400,
  Dave Wood <postgresql_question@attbi.com> wrote:
> I'm having a problem where the ORDER BY clause is not performing as I
> would expect.  I created a small set of test data:
>
> If this isn't considered a bug, then could somebody please enlighten me
> as to the proper use of ORDER BY?  In other words, how do I get postgres
> to order by using a sort typical of every text sort I've ever seen?
> Once that takes case into consideration and doesn't pick and choose
> which characters it will use for the sort.

It is a locale issue. Order by sorts by the locale ordering. I believe
the fix for this thi requires an initdb, specifying C for the locale.

Re: Order By ignoring "-" character

From
Tom Lane
Date:
Dave Wood <postgresql_question@attbi.com> writes:
> Postgres not only seems to ignore the case of the text, implying a
> clause of "order by upper(name)", but it also seems to drop characters
> when doing the sort (the "-" and " " preceding the letters are not used
> during the order by).  Doing some digging on this subject seemed to
> imply that my locale was not set correctly somehow.  This is an
> out-of-the-box install of postgres from RedHat 9 in the en_US locale.
                                                          ^^^^^^^^^^^^

You need locale "C" if you want plain-ASCII sort order.  Unfortunately,
changing this requires a re-initdb (because it affects index ordering).

There are ongoing discussions about whether or not we should force C as
the default locale rather than accept the default from your environment.
It seems that some people will be confused and unhappy no matter which
way we do it :-(

            regards, tom lane