Thread: Seemingly inconsistent ORDER BY behavior

Seemingly inconsistent ORDER BY behavior

From
Richard Hipp
Date:
Consider the following SQL:

---------------------------------------------------
CREATE TABLE t1(m VARCHAR(4));
INSERT INTO t1 VALUES('az');
INSERT INTO t1 VALUES('by');
INSERT INTO t1 VALUES('cx');

SELECT '1', substr(m,2) AS m
  FROM t1
 ORDER BY m;

SELECT '2', substr(m,2) AS m
  FROM t1
 ORDER BY lower(m);
---------------------------------------------------

Using PostgreSQL 9.1.5, the first query returns x-y-z while the second returns z-y-x.  Is this "correct"?  It certainly is surprising to me.

I'm asking because the same question has come up in SQLite and whenever there is a language dispute in SQLite, our first instinct is to find out what PostgreSQL does and try to do the same thing.  SQLite version 3.7.15 was behaving the same way as PostgreSQL 9.1.5 (unbeknownst to us at the time).  Then a bug was written about the inconsistent behavior of ORDER BY.  We fixed that bug so that the latest SQLite answers x-y-z in both cases.  Now someone is complaining that the "fix" was really a 'break".  Is it?  Or is there an equivalent bug in PostgreSQL? 

There are, of course, many ways to resolve the ambiguity (such as using a unique label for the result column, or by saying "t1.m" instead of just "m" when you mean the column of the table).  But that is not really the point here.  The question is, how should symbolic names in the ORDER BY clause be resolved?  Should column names in the source table take precedence over result column name, or should it be the other way around?

Any insights are appreciated.  Please advise if a different mailing list would be more appropriate for this question.
--
D. Richard Hipp
drh@sqlite.org

Re: Seemingly inconsistent ORDER BY behavior

From
Scott Marlowe
Date:
On Wed, Aug 14, 2013 at 12:01 PM, Richard Hipp <drh@sqlite.org> wrote:
>
> Consider the following SQL:
>
> ---------------------------------------------------
> CREATE TABLE t1(m VARCHAR(4));
> INSERT INTO t1 VALUES('az');
> INSERT INTO t1 VALUES('by');
> INSERT INTO t1 VALUES('cx');
>
> SELECT '1', substr(m,2) AS m
>   FROM t1
>  ORDER BY m;
>
> SELECT '2', substr(m,2) AS m
>   FROM t1
>  ORDER BY lower(m);
> ---------------------------------------------------
>
> Using PostgreSQL 9.1.5, the first query returns x-y-z while the second returns z-y-x.  Is this "correct"?  It
certainlyis surprising to me. 
>
> I'm asking because the same question has come up in SQLite and whenever there is a language dispute in SQLite, our
firstinstinct is to find out what PostgreSQL does and try to do the same thing.  SQLite version 3.7.15 was behaving the
sameway as PostgreSQL 9.1.5 (unbeknownst to us at the time).  Then a bug was written about the inconsistent behavior of
ORDERBY.  We fixed that bug so that the latest SQLite answers x-y-z in both cases.  Now someone is complaining that the
"fix"was really a 'break".  Is it?  Or is there an equivalent bug in PostgreSQL? 
>
> There are, of course, many ways to resolve the ambiguity (such as using a unique label for the result column, or by
saying"t1.m" instead of just "m" when you mean the column of the table).  But that is not really the point here.  The
questionis, how should symbolic names in the ORDER BY clause be resolved?  Should column names in the source table take
precedenceover result column name, or should it be the other way around? 
>
> Any insights are appreciated.  Please advise if a different mailing list would be more appropriate for this question.

My guess without testing is that order by lower(m) is not what you think it is.

substr(m,2) as m

is bad form. Always use a new and unique alias, like m1.  How does this work:

SELECT '2', substr(m,2) AS m1
  FROM t1
 ORDER BY lower(m1);


Re: Seemingly inconsistent ORDER BY behavior

From
Richard Hipp
Date:


On Wed, Aug 14, 2013 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Aug 14, 2013 at 12:01 PM, Richard Hipp <drh@sqlite.org> wrote:

substr(m,2) as m

is bad form. Always use a new and unique alias, like m1.  How does this work:

SELECT '2', substr(m,2) AS m1
  FROM t1
 ORDER BY lower(m1);

Tnx.  I think everybody agrees that "substr(m,2) as m" is bad form.  And all the database engines get the same consistent answer when you avoid the bad form and use "substr(m,2) as m1" instead.  The question is, what should the database engine do when the programmer disregards sounds advice and uses the bad form anyhow?

--
D. Richard Hipp
drh@sqlite.org

Re: Seemingly inconsistent ORDER BY behavior

From
Scott Marlowe
Date:
On Wed, Aug 14, 2013 at 12:31 PM, Richard Hipp <drh@sqlite.org> wrote:
>
>
> On Wed, Aug 14, 2013 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Wed, Aug 14, 2013 at 12:01 PM, Richard Hipp <drh@sqlite.org> wrote:
>>
>> substr(m,2) as m
>>
>> is bad form. Always use a new and unique alias, like m1.  How does this
>> work:
>>
>> SELECT '2', substr(m,2) AS m1
>>   FROM t1
>>  ORDER BY lower(m1);
>
>
> Tnx.  I think everybody agrees that "substr(m,2) as m" is bad form.  And all
> the database engines get the same consistent answer when you avoid the bad
> form and use "substr(m,2) as m1" instead.  The question is, what should the
> database engine do when the programmer disregards sounds advice and uses the
> bad form anyhow?

My guess is that either the SQL spec says it's system determined OR
that the way postgres does it is right. And I'm leaning towards the
second. Someone with the spec hand would have to look it up.


Re: Seemingly inconsistent ORDER BY behavior

From
Tom Lane
Date:
Richard Hipp <drh@sqlite.org> writes:
> There are, of course, many ways to resolve the ambiguity (such as using a
> unique label for the result column, or by saying "t1.m" instead of just "m"
> when you mean the column of the table).  But that is not really the point
> here.  The question is, how should symbolic names in the ORDER BY clause be
> resolved?  Should column names in the source table take precedence over
> result column name, or should it be the other way around?

Our interpretation is that a bare column name ("ORDER BY foo") is resolved
first as an output-column label, or failing that as an input-column name.
However, as soon as you embed a name in an expression, it will be treated
*only* as an input column name.

The SQL standard is not a lot of help here.  In SQL92, the only allowed
forms of ORDER BY arguments were an output column name or an output column
number.  SQL99 and later dropped that definition (acknowledging that they
were being incompatible) and substituted some fairly impenetrable verbiage
that seems to boil down to allowing input column names that can be within
expressions.  At least that's how we've chosen to read it.  Our current
behavior is a compromise that tries to support both editions of the spec.

            regards, tom lane


Re: Seemingly inconsistent ORDER BY behavior

From
Richard Hipp
Date:



On Wed, Aug 14, 2013 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Our interpretation is that a bare column name ("ORDER BY foo") is resolved
first as an output-column label, or failing that as an input-column name.
However, as soon as you embed a name in an expression, it will be treated
*only* as an input column name.

The SQL standard is not a lot of help here.  In SQL92, the only allowed
forms of ORDER BY arguments were an output column name or an output column
number.  SQL99 and later dropped that definition (acknowledging that they
were being incompatible) and substituted some fairly impenetrable verbiage
that seems to boil down to allowing input column names that can be within
expressions.  At least that's how we've chosen to read it.  Our current
behavior is a compromise that tries to support both editions of the spec.


Thanks for the explanation, Tom. 

Just to be clear, you intend that a COLLATE clause in the ORDER BY is treated as an expression, right?  So that the two queries in the following SQL output rows in the opposite order:

------------------------
CREATE TABLE t1(m VARCHAR(4));
INSERT INTO t1 VALUES('az');
INSERT INTO t1 VALUES('by');
INSERT INTO t1 VALUES('cx');
SELECT '1', substr(m,2) AS m FROM t1 ORDER BY m;
SELECT '2', substr(m,2) AS m FROM t1 ORDER BY m COLLATE "POSIX";
------------------------

If that is not correct, please let me know because I am about to change SQLite to work exactly as PostgreSQL does.

--
D. Richard Hipp
drh@sqlite.org

Re: Seemingly inconsistent ORDER BY behavior

From
rob stone
Date:

On Wed, 2013-08-14 at 14:01 -0400, Richard Hipp wrote:
> CREATE TABLE t1(m VARCHAR(4));
> INSERT INTO t1 VALUES('az');
> INSERT INTO t1 VALUES('by');
> INSERT INTO t1 VALUES('cx');
>
> SELECT '1', substr(m,2) AS m
>   FROM t1
>  ORDER BY m;
>
> SELECT '2', substr(m,2) AS m
>   FROM t1
>  ORDER BY lower(m);


You cannot cast your ORDER BY column value.

Instead:-

SELECT '2', LOWER(substr(m,2)) AS m
FROM t1
ORDER BY m;

will have the desired effect.



Re: Seemingly inconsistent ORDER BY behavior

From
Tom Lane
Date:
Richard Hipp <drh@sqlite.org> writes:
> Just to be clear, you intend that a COLLATE clause in the ORDER BY is
> treated as an expression, right?

Yeah, that is the current behavior, although I think that it probably fell
out rather than being consciously thought about.  Arguably it's the wrong
thing, because it looks to me like SQL92 allowed

       select ... order by 1 collate "something"

which our current code will interpret as ordering by a constant (and then
complain that you can't collate an int...).  Don't know if it's worth
changing --- we have our own backwards-compatibility concerns to think
about, and there haven't been any field complaints about this.  I doubt
there's any pressure at this point to become more compatible with SQL92,
as opposed to more compatible with later editions.

            regards, tom lane


Re: Seemingly inconsistent ORDER BY behavior

From
BladeOfLight16
Date:
On Wed, Aug 14, 2013 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Our interpretation is that a bare column name ("ORDER BY foo") is resolved
first as an output-column label, or failing that as an input-column name.
However, as soon as you embed a name in an expression, it will be treated
*only* as an input column name.

The SQL standard is not a lot of help here.  In SQL92, the only allowed
forms of ORDER BY arguments were an output column name or an output column
number.  SQL99 and later dropped that definition (acknowledging that they
were being incompatible) and substituted some fairly impenetrable verbiage
that seems to boil down to allowing input column names that can be within
expressions.  At least that's how we've chosen to read it.  Our current
behavior is a compromise that tries to support both editions of the spec.

Asking as a comparative know-nothing who would like to be more informed, is there something wrong with the notion of throwing an error that m in the ORDER BY clause is ambiguous here? As near as I can tell, it really is ambiguous as long as both input or output columns are accepted, so either way is essentially a total guess about what the user wants. It seems to me that throwing an error would be the most intuitive and clearly defined way of handling this case.

Re: Seemingly inconsistent ORDER BY behavior

From
Scott Marlowe
Date:
On Fri, Aug 16, 2013 at 8:37 PM, BladeOfLight16
<bladeoflight16@gmail.com> wrote:
> On Wed, Aug 14, 2013 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Our interpretation is that a bare column name ("ORDER BY foo") is resolved
>> first as an output-column label, or failing that as an input-column name.
>> However, as soon as you embed a name in an expression, it will be treated
>> *only* as an input column name.
>>
>> The SQL standard is not a lot of help here.  In SQL92, the only allowed
>> forms of ORDER BY arguments were an output column name or an output column
>> number.  SQL99 and later dropped that definition (acknowledging that they
>> were being incompatible) and substituted some fairly impenetrable verbiage
>> that seems to boil down to allowing input column names that can be within
>> expressions.  At least that's how we've chosen to read it.  Our current
>> behavior is a compromise that tries to support both editions of the spec.
>
>
> Asking as a comparative know-nothing who would like to be more informed, is
> there something wrong with the notion of throwing an error that m in the
> ORDER BY clause is ambiguous here? As near as I can tell, it really is
> ambiguous as long as both input or output columns are accepted, so either
> way is essentially a total guess about what the user wants. It seems to me
> that throwing an error would be the most intuitive and clearly defined way
> of handling this case.

Well it's not likely that the current behaviour will be changed since
there are likely apps that rely on it working (sort of) the way it is.

A warning or notice might make sense then.

--
To understand recursion, one must first understand recursion.