Re: order by different on mac vs linux - Mailing list pgsql-sql

From Scott Marlowe
Subject Re: order by different on mac vs linux
Date
Msg-id CAOR=d=2dEv2Cq7DWGoecVrYUr_Fm5r5cMbug55b_rNZfwVKcnQ@mail.gmail.com
Whole thread Raw
In response to Re: order by different on mac vs linux  (Wes James <comptekki@gmail.com>)
List pgsql-sql
On Wed, May 16, 2012 at 7:58 PM, Wes James <comptekki@gmail.com> wrote:
>
>
> On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <sgendler@ideasculptor.com>
> wrote:
>>
>>
>>
>> On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:
>>>
>>>
>>>
>>> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> Wes James <comptekki@gmail.com> writes:
>>>> > Why is there a different order on the different platforms.
>>>>
>>>> This is not exactly unusual.  You should first check to see if
>>>> lc_collate is set differently in the two installations --- but even if
>>>> it's the same, there are often platform-specific interpretations of
>>>> the sorting rules.  (Not to mention that OS X is flat out broken when
>>>> it comes to sorting UTF8 data ...)
>>>>
>>>
>>> I just ran these:
>>>
>>> linux:
>>>
>>> on linux
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> -------
>>>  FALSE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>> ------------------------
>>>
>>> on mac os x:
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> ------
>>>  TRUE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>>
>>> -----------------------
>>>
>>> Why is the linux postgres saying false with the lc_collage set the way it
>>> is?
>>
>>
>> That's the point - UTF-8 collation is just completely broken under OS X.
>>  There's much previous discussion of the topic on this list and elsewhere.
>>  If you're developing on OS X but running linux and you are mostly using an
>> ascii character set in your test dataset, set your development OS X boxes to
>> use C collation, which will basically do what you expect it do do until you
>> start throwing multibyte characters at it.  If you can't constrain your
>> testing/development dataset in such a manner and collation order really
>> matters during development, then you probably shouldn't develop on OS X.  I
>> spent a fair amount of time investigating how to define a new charset in
>> what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to
>> behave just like ti does on Linux.  I just gave it up after wasting a few
>> too many hours on it. It may be possible to do it, but the return on
>> invested time was non-existent for me so I abandoned my effort.
>>
>
> Why are people saying os x is broken in my case?  Looking
> at  http://www.utf8-chartable.de/  and  http://www.asciitable.com/  for sort
> order, ! should come before apache.  On os x it is correct, on ubuntu linux
> it is not.  In the order by output per my previous emails, it is correct on
> os x, but no on linux.  Why do people keep saying os x is broken, when it is
> doing the correct thing from what I've seen?

You're confusing encoding with locale.  UTF-8 is an encoding.  If
there were no Locale (i.e. it was set to C, simple byte ordering) then
you'd be correct.  HOWEVER, a locale is a different animal altogether.For instance, most locales ignore many characters
whenit comes to 
sort ordering.  Such as spaces:

smarlowe=# create table test (i text);
CREATE TABLE
smarlowe=# insert into test values ('abc'),(' abc'),('def'),(' def');
INSERT 0 4
smarlowe=# select * from test order by i; i
------abc abcdef def
(4 rows)

Note the spaces are ignored for sorting purposes.


pgsql-sql by date:

Previous
From: Wes James
Date:
Subject: Re: order by different on mac vs linux
Next
From: Andreas
Date:
Subject: Need help in grouping records