Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd) - Mailing list pgsql-general

From Barry Lind
Subject Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)
Date
Msg-id 3AE707AB.7090107@xythos.com
Whole thread Raw
In response to Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)  (<pgsql-general@commandprompt.com>)
List pgsql-general
The behavior seen here seems correct although not intuitive.  My guess
is that your database executables where created with locale support
enabled and your database was inited with a US locale.  The sorting
logic for the english locales only sort on alpha-numeric characters.
All other characters are ignored for sorting purposes.

So these values:

000-0987877374-00313
___-0987877410-00316
___-0987877430-00317
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318
get sorted as:

000098787737400313
098787741000316
098787743000317
100098787738100314
100098787739500315
200098787746100318
thus you see it is sorted correctly.  If you look at all of the other
examples you will see this explains all of the results you are seeing.

What I think you really want here is a binary sort order, not a locale
specific sort order.  So you should initdb your database with a locale
of C, or rebuild postgresql without locale support enabled.

I think there is a contrib program that you can use to see what locale
your database was created with.  Once your database is created you can't
change the locale, you will need to re-initdb.

thanks,
--Barry

Justin Clift wrote:

> Dunno if this is helpful, but there is a pattern.
>
> After stripping off the leading non-numeric characters, from the results
> in the previous email, this leaves :
>
> lxp=# SELECT threadid FROM test ORDER BY threadid;
>        threadid
> ----------------------
> 000-0987877374-00313
> 0987877410-00316
> 0987877430-00317
> 100-0987877381-00314
> 100-0987877395-00315
> 200-0987877461-00318
> (6 rows)
>
> lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1,
> 5);
>  substr
> --------
> 0
> 0
> 000-0
> 100-0
> 100-0
> 200-0
> (6 rows)
>
> lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid,
> 1,6);
>  substr
> ---------
> 000-09
> 09
> 09
> 100-09
> 100-09
> 200-09
> (6 rows)
>
> As for why?  Don't know.
>
> But the sorting is consistent.
>
> Regards and best wishes,
>
> Justin Clift
>
>
> pgsql-general@commandprompt.com wrote:
>
>> On Tue, 24 Apr 2001, Clayton Vernon wrote:
>>
>>> seems to be treating 'threadid' as numeric, not alpha.
>>> did you run any macros that needed to guess the format of the field? many of
>>> these are very poor in terms of looking down the table. sometimes the first
>>> row can inadvertently define things for you.
>>
>> Nope, to my knowledge nothing is going on except for normal INSERTs
>> and SELECTs. What confuses me is that 'threadid' is of type text, and even
>> if it were being treated as if it were of type numeric, its results are
>> still inconsistent, and somewhat inscrutable.
>>
>> I can't see any reasonable way that this sorting:
>>
>>>> lxp=# SELECT threadid FROM test ORDER BY threadid;
>>>>       threadid
>>>> ----------------------
>>>> 000-0987877374-00313
>>>> ___-0987877410-00316
>>>> ___-0987877430-00317
>>>> 100-0987877381-00314
>>>> 100-0987877395-00315
>>>> 200-0987877461-00318
>>>> (6 rows)
>>>
>> ...would be occuring, unless it's going out of its way to try to strip
>> non-numeric characters and treat the remaining assembled numbers as a
>> whole number to sort by, possibly?
>>
>> But if that were the case, that still doesn't explain why the substrings
>> (below) behave the way that they do, in re-ordering the underscores.
>>
>>> lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
>>> substr
>>> --------
>>> ___-0
>>> ___-0
>>> 000-0
>>> 100-0
>>> 100-0
>>> 200-0
>>> (6 rows)
>>>
>>>> Now, the underscores appear to PRECEDE the 0's. This seems at least a
>>>
>>> little more sane, however this is completely the opposite of where the
>>> underscore would be sorted with 7.0.3. Now consider the next substring, of
>>> six characters instead of five.
>>>
>>> lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1,6);
>>> substr
>>> ---------
>>> 000-09
>>> ___-09
>>> ___-09
>>> 100-09
>>> 100-09
>>> 200-09
>>> (6 rows)
>>
>> Thanks, though still puzzled,
>> Jw @ Command Prompt.
>> --
>> By way of pgsql-general@commandprompt.com
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


pgsql-general by date:

Previous
From: jdassen@cistron.nl (J.H.M. Dassen (Ray))
Date:
Subject: Re: indices are crashed after installation of rpm
Next
From: Bruce Momjian
Date:
Subject: Translations of PostgreSQL: Introduction and Concepts