Thread: "for SELECT DISTINCT, ORDER BY expressions must appear in select list" - is that the standart or a limitation of postgresql?

CREATE TABLE test
( id int4 NOT NULL DEFAULT nextval('teste_id_seq'::regclass), name varchar, number int4
)
WITHOUT OIDS;

The data:

1;"walter";1
2;"walter";1
3;"walter";1
4;"walter";1
5;"walter";2
6;"walter";3
7;"rodrigo";1
8;"rodrigo";2
9;"rodrigo";3

The query:

SELECT distinct name from test order by number

(well, I think that que query doesn't make any sense, but raises the error :) )

The error: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
appear in select list is due to a standart implementarion or a design
decision of postgres?

The comentary on parse_clause.c looks like the second option. I'm right?

[]'s
- Walter


Walter Cruz wrote:
> 
> SELECT distinct name from test order by number
> 
> (well, I think that que query doesn't make any sense, but raises the 
> error :) )
> 
> The error: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
> appear in select list is due to a standart implementarion or a design
> decision of postgres?

I think ORDER BY is defined to take place after DISTINCT, which means 
there is no meaningful "number" for it to order by. You could 
arbitrarily choose the first number encountered, but I can't see what 
sense it would make to order by them.

--   Richard Huxton  Archonet Ltd


On Feb 15, 2007, at 22:35 , Richard Huxton wrote:

> Walter Cruz wrote:
>> SELECT distinct name from test order by number
>> (well, I think that que query doesn't make any sense, but raises  
>> the error :) )
>> The error: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
>> appear in select list is due to a standart implementarion or a design
>> decision of postgres?
>
> I think ORDER BY is defined to take place after DISTINCT, which  
> means there is no meaningful "number" for it to order by. You could  
> arbitrarily choose the first number encountered, but I can't see  
> what sense it would make to order by them.

I believe the reason is that DISTINCT depends on a sort to determine  
uniqueness (distinctness), so it's a implementation detail that if  
you're going to include an ORDER BY, you also need to include the  
same columns in the ORDER BY in the DISTINCT clause. Though I suspect  
Richard is right that ORDER BY takes place after DISTINCT. (My  
cursory attempt at parsing the SQL 2003 draft failed me.)

On further thought, I bet

SELECT DISTINCT name
FROM test
ORDER BY name, number

fails with a different error, one directly supporting Richard's  
conclusion.

Michael Glaesemann
grzm seespotcode net




Michael Glaesemann <grzm@seespotcode.net> writes:
> On Feb 15, 2007, at 22:35 , Richard Huxton wrote:
>> Walter Cruz wrote:
>>> The error: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
>>> appear in select list is due to a standart implementarion or a design
>>> decision of postgres?
>> 
>> I think ORDER BY is defined to take place after DISTINCT, which  
>> means there is no meaningful "number" for it to order by. You could  
>> arbitrarily choose the first number encountered, but I can't see  
>> what sense it would make to order by them.

> I believe the reason is that DISTINCT depends on a sort to determine  
> uniqueness (distinctness), so it's a implementation detail that if  
> you're going to include an ORDER BY, you also need to include the  
> same columns in the ORDER BY in the DISTINCT clause.

No, there's actually a definitional reason for it.  Consider
SELECT DISTINCT x FROM tab ORDER BY y;

For any particular x-value in the table there might be many different y
values.  Which one will you use to sort that x-value in the output?

Back in SQL92 they avoided this problem by specifying that ORDER BY
entries had to reference output columns.  SQL99 has some messy verbiage
that I think comes out at the same place as our restriction:
                  A) If K(i) is not equivalent to a <value expression>                     immediately contained in any
<derivedcolumn> in the                     <select list> SL of <query specification> QS contained
inQE, then:
 
                     I) T shall not be a grouped table.
                    II) QS shall not specify the <set quantifier> DISTINCT                       or directly contain
oneor more <set function                       specification>s.
 

        regards, tom lane


Thanks Tom, Thank all :)

Maybe the commentary on parse_clase.c (beggining with "Now, DISTINCT
list consists of all non-resjunk") needs to be updated - In the
comment, looks likes this is a postgresql limitation.

[]'s
- Walter

On 2/15/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Glaesemann <grzm@seespotcode.net> writes:
> > On Feb 15, 2007, at 22:35 , Richard Huxton wrote:
> >> Walter Cruz wrote:
> >>> The error: ERROR:  for SELECT DISTINCT, ORDER BY expressions must
> >>> appear in select list is due to a standart implementarion or a design
> >>> decision of postgres?
> >>
> >> I think ORDER BY is defined to take place after DISTINCT, which
> >> means there is no meaningful "number" for it to order by. You could
> >> arbitrarily choose the first number encountered, but I can't see
> >> what sense it would make to order by them.
>
> > I believe the reason is that DISTINCT depends on a sort to determine
> > uniqueness (distinctness), so it's a implementation detail that if
> > you're going to include an ORDER BY, you also need to include the
> > same columns in the ORDER BY in the DISTINCT clause.
>
> No, there's actually a definitional reason for it.  Consider
>
>         SELECT DISTINCT x FROM tab ORDER BY y;
>
> For any particular x-value in the table there might be many different y
> values.  Which one will you use to sort that x-value in the output?
>
> Back in SQL92 they avoided this problem by specifying that ORDER BY
> entries had to reference output columns.  SQL99 has some messy verbiage
> that I think comes out at the same place as our restriction:
>
>                    A) If K(i) is not equivalent to a <value expression>
>                       immediately contained in any <derived column> in the
>                       <select list> SL of <query specification> QS contained
>                       in QE, then:
>
>                       I) T shall not be a grouped table.
>
>                      II) QS shall not specify the <set quantifier> DISTINCT
>                         or directly contain one or more <set function
>                         specification>s.
>
>
>                         regards, tom lane
>