Thread: Order by and strings

Order by and strings

From
Fredric Fredricson
Date:
Hi!
New to the list with a question that I cannot find the answer to in the manual or on the internet but I suspect is trivial. If somebody could point me in the correct direction I would be greatful.

This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
 x
----
 a
 -b
 c
(3 rows)

I would expect a string that start with a hyphen to be sorted before or after 'a' and 'c' and not between them. I have tried with a few other characters (space, opening parenthesis, etc) but the result is the same.

What I want is the strings sorted by their ascii (or UTF-8) values, without some "smart" heuristic. How do I accomplish this?

I cannot find this described in the manual (it should be there!) or on the net.

/Fredric

PS. 8.2, Fedora Linux

Attachment

Re: Order by and strings

From
Justin Graf
Date:
On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
Hi!
New to the list with a question that I cannot find the answer to in the manual or on the internet but I suspect is trivial. If somebody could point me in the correct direction I would be greatful.

This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
 x
----
 a
 -b
 c
(3 rows)



It has to do with the collation you are using
see
http://www.postgresql.org/docs/8.1/interactive/charset.html

All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.

Re: Order by and strings

From
Fredric Fredricson
Date:
Justin Graf wrote:
On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
Hi!
New to the list with a question that I cannot find the answer to in the manual or on the internet but I suspect is trivial. If somebody could point me in the correct direction I would be greatful.

This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
 x
----
 a
 -b
 c
(3 rows)

It has to do with the collation you are using
I use locale en_US.UTF-8 but why this should affect how leading characters in strings are ignored is beyond me.

Another, in my mind, counter-intuitive example:
# create table tmp ( x text ) ;
# insert into tmp(x) (values ('a'),('-b'),('c'),('aa'),('---a-b'),('ac'),('1'),('-2'),('+3'),('4'),('-'),('+')) ;
# select * from tmp order by x ;
   x
--------
 -
 +
 1
 -2
 +3
 4
 a
 aa
 ---a-b
 ac
 -b
 c
(12 rows)

In what universe would you expect this sort order? And how to make it 'sane'?
I found a work-around, "order by ascii(x),x", but this continues to baffle me.

It seems to me that if there are any alphanumeric characters in the string the rest are ignored in the sort.
Where did this rule come from?
I really would appreciate an explanation for this behavior.

/Fredric

PS. I was wrong about the server version, it is 8.3.8.
Attachment

Re: Order by and strings

From
Scott Marlowe
Date:
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
> Justin Graf wrote:
>
> On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
>
> Hi!
> New to the list with a question that I cannot find the answer to in the
> manual or on the internet but I suspect is trivial. If somebody could point
> me in the correct direction I would be greatful.

It seems you're seeking ASCII or C locale sorting.

> # select * from tmp order by x ;
>    x
> --------
>  -
>  +
>  1
>  -2
>  +3
>  4
>  a
>  aa
>  ---a-b
>  ac
>  -b
>  c
> (12 rows)
>
> In what universe would you expect this sort order? And how to make it
> 'sane'?

In a library perhaps?

> I found a work-around, "order by ascii(x),x", but this continues to baffle
> me.

It's quite simple.  en_US locale, and others like it sort by ignoring
things like white space and noise characters so that only letters and
numbers count, and things like ñ sort right near n, not at the end or
beginning of the table.

> It seems to me that if there are any alphanumeric characters in the string
> the rest are ignored in the sort.
> Where did this rule come from?
> I really would appreciate an explanation for this behavior.

It's been around quite some time.  I'm afraid I'll have to defer to
some other expert on the exact history.

Re: Order by and strings

From
Scott Marlowe
Date:
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
> I use locale en_US.UTF-8 but why this should affect how leading characters
> in strings are ignored is beyond me.

P.s. this page may shed some light on the subject:

http://en.wikipedia.org/wiki/Collation

Re: Order by and strings

From
Fredric Fredricson
Date:
Scott Marlowe wrote:
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote: 
I use locale en_US.UTF-8 but why this should affect how leading characters
in strings are ignored is beyond me.   
P.s. this page may shed some light on the subject:

http://en.wikipedia.org/wiki/Collation
OK, thanks. It did shed some light on the subject. Only I wonder what would happen if these sort algorithms where used on things like article numbers in the industry. That would confuse the hell out of the people there.
Alas, not my problem. I have a work around that works for my little part of the universe. At least for now.

You live, you learn.
/Fredric
Attachment

Re: Order by and strings

From
Magnus Hagander
Date:
On Tue, Feb 9, 2010 at 11:21, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
> Scott Marlowe wrote:
>
> On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
> <Fredric.Fredricson@bonetmail.com> wrote:
>
>
> I use locale en_US.UTF-8 but why this should affect how leading characters
> in strings are ignored is beyond me.
>
>
> P.s. this page may shed some light on the subject:
>
> http://en.wikipedia.org/wiki/Collation
>
> OK, thanks. It did shed some light on the subject. Only I wonder what would
> happen if these sort algorithms where used on things like article numbers in
> the industry. That would confuse the hell out of the people there.

In case you are storing something like that, you may be better off
using bytea instead of text.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/