Thread: SELECT DISTINCT question

SELECT DISTINCT question

From
Oleg Bartunov
Date:
I got a problem with query:

select distinct (date), bytes  from access_log;

which works but not as I expect. I thought this query will select
all rows with distinct values of 'date' column, but it get
distinct pairs 'date, bytes' . From documnetation I see

"DISTINCT will eliminate all duplicate rows from the selection. 
DISTINCT ON column will eliminate all duplicates in the specified column; 
this is equivalent to using GROUP BY column. 
ALL will return all candidate rows, including duplicates."      
discovery=> select distinct on date,bytes  from access_log;
ERROR:  parser: parse error at or near ","
Regards,
    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83




Re: [HACKERS] SELECT DISTINCT question

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> discovery=> select distinct on date,bytes  from access_log;
> ERROR:  parser: parse error at or near ","

The syntax for SELECT DISTINCT ON is just as brain-damaged as the
functionality itself: there's no comma after the column name.
You want

select distinct on date  date,bytes  from access_log;

The reason the functionality is brain-damaged is that there's no way to
know which tuple out of the set of tuples with a given "date" value will
be the one returned.

SELECT DISTINCT ON is not in SQL92, and I think it shouldn't be in
Postgres either...
        regards, tom lane


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> thanks, this works. But why parser complains about such query:

> discovery=> select distinct on a.date  a.date, a.bytes  from access_log a;
> ERROR:  parser: parse error at or near "."

Probably the grammar specifies just <column name> and not anything
more complex after DISTINCT ON.  It'd be risky to try to accept a
general expression after ON, due to the silly decision to leave out
any terminating punctuation.

>> SELECT DISTINCT ON is not in SQL92, and I think it shouldn't be in
>> Postgres either...

> I'm not an SQL expert, but if it works and this feature is in standard
> but only syntax is diffrent,

No, there is no feature in SQL that allows DISTINCT on a subset of
columns, period.  This is not merely a matter of syntax, it's a
fundamental semantic issue.

> why just not to use standard
>
> select distinct(date), bytes  from access_log;
>
> Or I'm missing here ?

I don't think that does what you expect it to (hint: the
parentheses are redundant).
        regards, tom lane


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Oleg Bartunov
Date:
On Sat, 10 Jul 1999, Tom Lane wrote:

> Date: Sat, 10 Jul 1999 17:18:28 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Oleg Bartunov <oleg@sai.msu.su>
> Cc: hackers@postgreSQL.org, pgsql-sql@postgreSQL.org
> Subject: [SQL] Re: [HACKERS] SELECT DISTINCT question 
> 
> Oleg Bartunov <oleg@sai.msu.su> writes:
> > discovery=> select distinct on date,bytes  from access_log;
> > ERROR:  parser: parse error at or near ","
> 
> The syntax for SELECT DISTINCT ON is just as brain-damaged as the
> functionality itself: there's no comma after the column name.
> You want
> 
> select distinct on date  date,bytes  from access_log;
> 

thanks, this works. But why parser complains about such query:

discovery=> select distinct on a.date  a.date, a.bytes  from access_log a;
ERROR:  parser: parse error at or near "."

In this query I could just omit '.', but in more complex query 
I probably could need one.

> The reason the functionality is brain-damaged is that there's no way to
> know which tuple out of the set of tuples with a given "date" value will
> be the one returned.
> 
> SELECT DISTINCT ON is not in SQL92, and I think it shouldn't be in
> Postgres either...


I'm not an SQL expert, but if it works and this feature is in standard
but only syntax is diffrent, why just not to use standard

select distinct(date), bytes  from access_log;

Or I'm missing here ?

Regards,    Oleg
> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: [HACKERS] SELECT DISTINCT question

From
Hannu Krosing
Date:
Oleg Bartunov wrote:
> 
> I got a problem with query:
> 
> select distinct (date), bytes  from access_log;
> 
> which works but not as I expect. I thought this query will select
> all rows with distinct values of 'date' column, but it get
> distinct pairs 'date, bytes' . From documnetation I see
> 
> "DISTINCT will eliminate all duplicate rows from the selection.
> DISTINCT ON column will eliminate all duplicates in the specified column;
> this is equivalent to using GROUP BY column.

If it is equivalent to GROUP BY then it should allow only aggregates 
in non-distinct columns, like:

select distinct on date date, sum(bytes) from access_log;

If it does not, then it should be files as a bug imho.

-----------------
Hannu


Re: [HACKERS] SELECT DISTINCT question

From
Tom Lane
Date:
Hannu Krosing <hannu@trust.ee> writes:
>> "DISTINCT will eliminate all duplicate rows from the selection.
>> DISTINCT ON column will eliminate all duplicates in the specified column;
>> this is equivalent to using GROUP BY column."

> If it is equivalent to GROUP BY then it should allow only aggregates 
> in non-distinct columns, like:
> select distinct on date date, sum(bytes) from access_log;
> If it does not, then it should be files as a bug imho.

It does not.  Whether that is a bug is hard to say, since there is no
standard I know of that says what it *is* supposed to do.

If you look at the select_distinct_on regress test outputs, I bet you
will be even less happy:

QUERY: SELECT DISTINCT ON string4 two, string4, ten   FROM tmp  ORDER BY two using <, string4 using <, ten using <;
two|string4|ten
---+-------+--- 0|AAAAxx |  0 0|HHHHxx |  0 0|OOOOxx |  0 0|VVVVxx |  0 1|AAAAxx |  1 1|HHHHxx |  1 1|OOOOxx |  1
1|VVVVxx|  1
 
(8 rows)

That's not exactly my idea of "distinct" values of string4 ---
but apparently whoever made up the regress test thought it was OK!

Can anyone defend this feature or provide a coherent definition
of what it's supposed to be doing?  My urge to rip it out is
growing stronger and stronger...
        regards, tom lane


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Bruce Momjian
Date:
Tom, any status on this DISTINCT ON ripout?


> Hannu Krosing <hannu@trust.ee> writes:
> >> "DISTINCT will eliminate all duplicate rows from the selection.
> >> DISTINCT ON column will eliminate all duplicates in the specified column;
> >> this is equivalent to using GROUP BY column."
> 
> > If it is equivalent to GROUP BY then it should allow only aggregates 
> > in non-distinct columns, like:
> > select distinct on date date, sum(bytes) from access_log;
> > If it does not, then it should be files as a bug imho.
> 
> It does not.  Whether that is a bug is hard to say, since there is no
> standard I know of that says what it *is* supposed to do.
> 
> If you look at the select_distinct_on regress test outputs, I bet you
> will be even less happy:
> 
> QUERY: SELECT DISTINCT ON string4 two, string4, ten
>        FROM tmp
>    ORDER BY two using <, string4 using <, ten using <;
> two|string4|ten
> ---+-------+---
>   0|AAAAxx |  0
>   0|HHHHxx |  0
>   0|OOOOxx |  0
>   0|VVVVxx |  0
>   1|AAAAxx |  1
>   1|HHHHxx |  1
>   1|OOOOxx |  1
>   1|VVVVxx |  1
> (8 rows)
> 
> That's not exactly my idea of "distinct" values of string4 ---
> but apparently whoever made up the regress test thought it was OK!
> 
> Can anyone defend this feature or provide a coherent definition
> of what it's supposed to be doing?  My urge to rip it out is
> growing stronger and stronger...
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, any status on this DISTINCT ON ripout?

I haven't done anything about it, if that's what you mean.

I didn't get any indication of whether anyone else agreed with me
(maybe the lack of loud complaints should be taken as consent?)
        regards, tom lane


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Tom, any status on this DISTINCT ON ripout?
> 
> I haven't done anything about it, if that's what you mean.
> 
> I didn't get any indication of whether anyone else agreed with me
> (maybe the lack of loud complaints should be taken as consent?)

I think so.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Re: [HACKERS] SELECT DISTINCT question

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Tom, any status on this DISTINCT ON ripout?
> 
> I haven't done anything about it, if that's what you mean.
> 
> I didn't get any indication of whether anyone else agreed with me
> (maybe the lack of loud complaints should be taken as consent?)

I will wrap up the mail messages, and put it on the TODO list.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


From
"Sean Mullen"
Date:
unsubscribe