Re: [HACKERS] SELECT DISTINCT question - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] SELECT DISTINCT question
Date
Msg-id 24413.931901508@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] SELECT DISTINCT question  (Hannu Krosing <hannu@trust.ee>)
Responses Re: [SQL] Re: [HACKERS] SELECT DISTINCT question  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Hannu Krosing <hannu@trust.ee> writes:
>> "DISTINCT will eliminate all duplicate rows from the selection.
>> DISTINCT ON column will eliminate all duplicates in the specified column;
>> this is equivalent to using GROUP BY column."

> If it is equivalent to GROUP BY then it should allow only aggregates 
> in non-distinct columns, like:
> select distinct on date date, sum(bytes) from access_log;
> If it does not, then it should be files as a bug imho.

It does not.  Whether that is a bug is hard to say, since there is no
standard I know of that says what it *is* supposed to do.

If you look at the select_distinct_on regress test outputs, I bet you
will be even less happy:

QUERY: SELECT DISTINCT ON string4 two, string4, ten   FROM tmp  ORDER BY two using <, string4 using <, ten using <;
two|string4|ten
---+-------+--- 0|AAAAxx |  0 0|HHHHxx |  0 0|OOOOxx |  0 0|VVVVxx |  0 1|AAAAxx |  1 1|HHHHxx |  1 1|OOOOxx |  1
1|VVVVxx|  1
 
(8 rows)

That's not exactly my idea of "distinct" values of string4 ---
but apparently whoever made up the regress test thought it was OK!

Can anyone defend this feature or provide a coherent definition
of what it's supposed to be doing?  My urge to rip it out is
growing stronger and stronger...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [HACKERS] SELECT DISTINCT question
Next
From: "Ross J. Reedstrom"
Date:
Subject: Re: [HACKERS] PostgreSQL v6.5 - Tagged