Thread: order by question

order by question

From
Steve Clark
Date:
Hello,

I am confused by how postgres 8,4..13 is sorting my data.

\d test
     Table "public.test"
  Column | Type | Modifiers
--------+------+-----------
  data   | text |

select * from test order by data;
    data
----------

  -
  --
  1
  11
  11F
  1F
  a
  b
  C
  F
  -F
   Feneric
   Generic
(14 rows)

The first row is a single space, the next row a single -, the next two -- .
What I don't understand is why the '-F', the ' Feneric' and the ' Generic' sort where they do.

I would expect the output to be like this:

    data
----------

   Feneric
   Generic
  -
  --
  -F
  1
  11
  11F
  1F
  a
  b
  C
  F
(14 rows)

client_encoding
-----------------
  SQL_ASCII

  lc_collate
-------------
  en_US.UTF-8

foxboxconfig=# show lc_ctype;
   lc_ctype
-------------
  en_US.UTF-8


Thanks for any clarification.


--
Stephen Clark







Re: order by question

From
Kevin Grittner
Date:
Steve Clark <sclark@netwolves.com> wrote:

> I am confused by how postgres 8,4..13 is sorting my data.

> select * from test order by data;
>     data
> ----------
>
>   -
>   --
>   1
>   11
>   11F
>   1F
>   a
>   b
>   C
>   F
>   -F
>   Feneric
>   Generic
> (14 rows)
>
> The first row is a single space, the next row a single -, the next two -- .
> What I don't understand is why the '-F', the ' Feneric' and
> the ' Generic' sort where they do.
>
> I would expect the output to be like this:
>
>     data
> ----------
>
>   Feneric
>   Generic
>   -
>   --
>   -F
>   1
>   11
>   11F
>   1F
>   a
>   b
>   C
>   F
> (14 rows)

>   lc_collate
> -------------
>   en_US.UTF-8

PostgreSQL uses the OS collations.  What you are getting matches my
Ubuntu 14.04 machine:

kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG
en_US.UTF-8
kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX
>
>   Feneric
>   Generic
>   -
>   --
>   -F
>   1
>   11
>   11F
>   1F
>   a
>   b
>   C
>   F
> XXX

  -
  --
  1
  11
  11F
  1F
  a
  b
  C
  F
  -F
  Feneric
  Generic

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: order by question

From
Steve Clark
Date:
On 08/07/2014 11:36 AM, Kevin Grittner wrote:
> Steve Clark <sclark@netwolves.com> wrote:
>
>> I am confused by how postgres 8,4..13 is sorting my data.
>> select * from test order by data;
>>       data
>> ----------
>>
>>     -
>>     --
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>>     -F
>>     Feneric
>>     Generic
>> (14 rows)
>>
>> The first row is a single space, the next row a single -, the next two -- .
>> What I don't understand is why the '-F', the ' Feneric' and
>> the ' Generic' sort where they do.
>>
>> I would expect the output to be like this:
>>
>>       data
>> ----------
>>
>>     Feneric
>>     Generic
>>     -
>>     --
>>     -F
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>> (14 rows)
>>     lc_collate
>> -------------
>>     en_US.UTF-8
> PostgreSQL uses the OS collations.  What you are getting matches my
> Ubuntu 14.04 machine:
>
> kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG
> en_US.UTF-8
> kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX
>>     Feneric
>>     Generic
>>     -
>>     --
>>     -F
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>> XXX
>    -
>    --
>    1
>    11
>    11F
>    1F
>    a
>    b
>    C
>    F
>    -F
>    Feneric
>    Generic
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
Hi Kevin,

Thanks for the response. I get the same thing as postgres if I sort from the command line too. But I don't understand
why.

I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort immediately after ' ' (space).

It is like the space character and the - in -Letter is ignored.


--
Stephen Clark

Re: order by question

From
Adrian Klaver
Date:
On 08/07/2014 08:53 AM, Steve Clark wrote:
> On 08/07/2014 11:36 AM, Kevin Grittner wrote:
>> Steve Clark <sclark@netwolves.com> wrote:
>>

>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
> Hi Kevin,
>
> Thanks for the response. I get the same thing as postgres if I sort from
> the command line too. But I don't understand why.

Because that is how en_US.UTF-8 sorts. You are probably looking for the
'C' sort order. For more explanations see:

http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html

http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1

>
> I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
> well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort
> immediately after ' ' (space).
>
> It is like the space character and the - in -Letter is ignored.

See the second link above for why that is.

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: order by question

From
Kevin Grittner
Date:
Steve Clark <sclark@netwolves.com> wrote:

> It is like the space character and the - in -Letter is ignored.

Yes, that is how the en_US collation is defined.  I think the goal
is to make it something like "phone book" ordering.  If you still
have a "white pages" book around, look at how a business name with
a hyphen (or a hyphenated last name) is sorted there.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: order by question

From
Steve Clark
Date:
On 08/07/2014 12:10 PM, Adrian Klaver wrote:
> On 08/07/2014 08:53 AM, Steve Clark wrote:
>> On 08/07/2014 11:36 AM, Kevin Grittner wrote:
>>> Steve Clark <sclark@netwolves.com> wrote:
>>>
>>> --
>>> Kevin Grittner
>>> EDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>> Hi Kevin,
>>
>> Thanks for the response. I get the same thing as postgres if I sort from
>> the command line too. But I don't understand why.
> Because that is how en_US.UTF-8 sorts. You are probably looking for the
> 'C' sort order. For more explanations see:
>
> http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html
>
> http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1
>
>> I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
>> well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort
>> immediately after ' ' (space).
>>
>> It is like the space character and the - in -Letter is ignored.
> See the second link above for why that is.
>
>>
>
Thanks Adrian and Kevin,

for the explanations and links. They were very elucidating. At least now I understand what is happening.

Thanks again.

--
Stephen Clark



Re: order by question

From
Steve Clark
Date:
On 08/07/2014 11:36 AM, Kevin Grittner wrote:
> Steve Clark <sclark@netwolves.com> wrote:
>
>> I am confused by how postgres 8,4..13 is sorting my data.
>> select * from test order by data;
>>       data
>> ----------
>>
>>     -
>>     --
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>>     -F
>>     Feneric
>>     Generic
>> (14 rows)
>>
>> The first row is a single space, the next row a single -, the next two -- .
>> What I don't understand is why the '-F', the ' Feneric' and
>> the ' Generic' sort where they do.
>>
>> I would expect the output to be like this:
>>
>>       data
>> ----------
>>
>>     Feneric
>>     Generic
>>     -
>>     --
>>     -F
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>> (14 rows)
>>     lc_collate
>> -------------
>>     en_US.UTF-8
> PostgreSQL uses the OS collations.  What you are getting matches my
> Ubuntu 14.04 machine:
>
> kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG
> en_US.UTF-8
> kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX
>>     Feneric
>>     Generic
>>     -
>>     --
>>     -F
>>     1
>>     11
>>     11F
>>     1F
>>     a
>>     b
>>     C
>>     F
>> XXX
>    -
>    --
>    1
>    11
>    11F
>    1F
>    a
>    b
>    C
>    F
>    -F
>    Feneric
>    Generic
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
Hi Kevin,

Thanks for the response. I get the same thing as postgres if I sort from the command line too. But I don't understand
why.

I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort immediately after ' ' (space).

It is like the space character and the - in -Letter is ignored.


--
Stephen Clark