Question on Bizarre Sorting (ORDER BY in 7.1) - Mailing list pgsql-general

From
Subject Question on Bizarre Sorting (ORDER BY in 7.1)
Date
Msg-id Pine.LNX.4.30.0104241043480.5175-100000@crazypenguins.commandprompt.com
Whole thread Raw
In response to Re: installing DBD::Pg without installing postgres  (Fran Fabrizio <ffabrizio@Exchange.WebMD.net>)
Responses Re: Question on Bizarre Sorting (ORDER BY in 7.1)
List pgsql-general
Good day,

We've run into a strange bit of sorting behavior with the new release of
PostgreSQL 7.1. Specifically, we have some text that we're using as
threadids in a discussion board, which look like the following example:

       threadid
----------------------
 000-0987877374-00313
 ___-0987877410-00316
 ___-0987877430-00317
 100-0987877381-00314
 100-0987877395-00315
 200-0987877461-00318

The signifigance of the numbers is secondary to the alphanumeric sorting
of them. You can see above that the first three characters are either
numeric or underscores. We were using the underscores as a means to force
"unrated" threads to be sorted after rated threads, and with PostgreSQL
7.0.3, and with some CVS snapshots for 7.1, it worked fine! If I performed
the query:

lxp=# SELECT threadid FROM test ORDER BY threadid;

I'd get:

       threadid
----------------------
 000-0987877374-00313
 100-0987877381-00314
 100-0987877395-00315
 200-0987877461-00318
 ___-0987877410-00316
 ___-0987877430-00317

However, at some point between the last snapshot we grabbed (several weeks
ago) and the release of 7.1, this behavior has changed. If I do the same
sort now, I get:

lxp=# SELECT threadid FROM test ORDER BY threadid;
       threadid
----------------------
 000-0987877374-00313
 ___-0987877410-00316
 ___-0987877430-00317
 100-0987877381-00314
 100-0987877395-00315
 200-0987877461-00318
(6 rows)

At first blush, it seems that it's somehow coming to the conclusion that
the underscore alphanumerically follows the 0, and preceds the 1. (?!)
However, that's not the end of it! Observe this unpredictable behavior
with ordering by substrings:

lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
 substr
--------
 ___-0
 ___-0
 000-0
 100-0
 100-0
 200-0
(6 rows)

Now, the underscores appear to PRECEDE the 0's. This seems at least a
little more sane, however this is completely the opposite of where the
underscore would be sorted with 7.0.3. Now consider the next substring, of
six characters instead of five.

lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1, 6);
 substr
---------
 000-09
 ___-09
 ___-09
 100-09
 100-09
 200-09
(6 rows)

Back to the underscores fitting between 0 and 1 again, simply by adding
a 9 to the end of the ids. Logically, I'm at a loss for why this should be.

I've already re-factored my system to use purely numeric values for
sorting, because it was impairing the capability of our message boards to
be properly sequenced, but I was interested in knowing whether or not this
is a bug, a change in the way PostgreSQL sorts, or possibly some kind of
locale-specific misconfiguration?



Any insight would be appreciated,
Jw @ Command Prompt.
--
By way of pgsql-general@commandprompt.com.


pgsql-general by date:

Previous
From: will trillich
Date:
Subject: Re: Displaying keys and triggers of tables?
Next
From: "Richard Huxton"
Date:
Subject: Re: Problem with postgreSQL (number of backends)