Thread: SELECT DISTINCT question
I got a problem with query: select distinct (date), bytes from access_log; which works but not as I expect. I thought this query will select all rows with distinct values of 'date' column, but it get distinct pairs 'date, bytes' . From documnetation I see "DISTINCT will eliminate all duplicate rows from the selection. DISTINCT ON column will eliminate all duplicates in the specified column; this is equivalent to using GROUP BY column. ALL will return all candidate rows, including duplicates." discovery=> select distinct on date,bytes from access_log; ERROR: parser: parse error at or near "," Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > discovery=> select distinct on date,bytes from access_log; > ERROR: parser: parse error at or near "," The syntax for SELECT DISTINCT ON is just as brain-damaged as the functionality itself: there's no comma after the column name. You want select distinct on date date,bytes from access_log; The reason the functionality is brain-damaged is that there's no way to know which tuple out of the set of tuples with a given "date" value will be the one returned. SELECT DISTINCT ON is not in SQL92, and I think it shouldn't be in Postgres either... regards, tom lane
On Sat, 10 Jul 1999, Tom Lane wrote: > Date: Sat, 10 Jul 1999 17:18:28 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: hackers@postgreSQL.org, pgsql-sql@postgreSQL.org > Subject: [SQL] Re: [HACKERS] SELECT DISTINCT question > > Oleg Bartunov <oleg@sai.msu.su> writes: > > discovery=> select distinct on date,bytes from access_log; > > ERROR: parser: parse error at or near "," > > The syntax for SELECT DISTINCT ON is just as brain-damaged as the > functionality itself: there's no comma after the column name. > You want > > select distinct on date date,bytes from access_log; > thanks, this works. But why parser complains about such query: discovery=> select distinct on a.date a.date, a.bytes from access_log a; ERROR: parser: parse error at or near "." In this query I could just omit '.', but in more complex query I probably could need one. > The reason the functionality is brain-damaged is that there's no way to > know which tuple out of the set of tuples with a given "date" value will > be the one returned. > > SELECT DISTINCT ON is not in SQL92, and I think it shouldn't be in > Postgres either... I'm not an SQL expert, but if it works and this feature is in standard but only syntax is diffrent, why just not to use standard select distinct(date), bytes from access_log; Or I'm missing here ? Regards, Oleg > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > > I got a problem with query: > > select distinct (date), bytes from access_log; > > which works but not as I expect. I thought this query will select > all rows with distinct values of 'date' column, but it get > distinct pairs 'date, bytes' . From documnetation I see > > "DISTINCT will eliminate all duplicate rows from the selection. > DISTINCT ON column will eliminate all duplicates in the specified column; > this is equivalent to using GROUP BY column. If it is equivalent to GROUP BY then it should allow only aggregates in non-distinct columns, like: select distinct on date date, sum(bytes) from access_log; If it does not, then it should be files as a bug imho. ----------------- Hannu
Hannu Krosing <hannu@trust.ee> writes: >> "DISTINCT will eliminate all duplicate rows from the selection. >> DISTINCT ON column will eliminate all duplicates in the specified column; >> this is equivalent to using GROUP BY column." > If it is equivalent to GROUP BY then it should allow only aggregates > in non-distinct columns, like: > select distinct on date date, sum(bytes) from access_log; > If it does not, then it should be files as a bug imho. It does not. Whether that is a bug is hard to say, since there is no standard I know of that says what it *is* supposed to do. If you look at the select_distinct_on regress test outputs, I bet you will be even less happy: QUERY: SELECT DISTINCT ON string4 two, string4, ten FROM tmp ORDER BY two using <, string4 using <, ten using <; two|string4|ten ---+-------+--- 0|AAAAxx | 0 0|HHHHxx | 0 0|OOOOxx | 0 0|VVVVxx | 0 1|AAAAxx | 1 1|HHHHxx | 1 1|OOOOxx | 1 1|VVVVxx| 1 (8 rows) That's not exactly my idea of "distinct" values of string4 --- but apparently whoever made up the regress test thought it was OK! Can anyone defend this feature or provide a coherent definition of what it's supposed to be doing? My urge to rip it out is growing stronger and stronger... regards, tom lane
Tom, any status on this DISTINCT ON ripout? > Hannu Krosing <hannu@trust.ee> writes: > >> "DISTINCT will eliminate all duplicate rows from the selection. > >> DISTINCT ON column will eliminate all duplicates in the specified column; > >> this is equivalent to using GROUP BY column." > > > If it is equivalent to GROUP BY then it should allow only aggregates > > in non-distinct columns, like: > > select distinct on date date, sum(bytes) from access_log; > > If it does not, then it should be files as a bug imho. > > It does not. Whether that is a bug is hard to say, since there is no > standard I know of that says what it *is* supposed to do. > > If you look at the select_distinct_on regress test outputs, I bet you > will be even less happy: > > QUERY: SELECT DISTINCT ON string4 two, string4, ten > FROM tmp > ORDER BY two using <, string4 using <, ten using <; > two|string4|ten > ---+-------+--- > 0|AAAAxx | 0 > 0|HHHHxx | 0 > 0|OOOOxx | 0 > 0|VVVVxx | 0 > 1|AAAAxx | 1 > 1|HHHHxx | 1 > 1|OOOOxx | 1 > 1|VVVVxx | 1 > (8 rows) > > That's not exactly my idea of "distinct" values of string4 --- > but apparently whoever made up the regress test thought it was OK! > > Can anyone defend this feature or provide a coherent definition > of what it's supposed to be doing? My urge to rip it out is > growing stronger and stronger... > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Tom, any status on this DISTINCT ON ripout? I haven't done anything about it, if that's what you mean. I didn't get any indication of whether anyone else agreed with me (maybe the lack of loud complaints should be taken as consent?) regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Tom, any status on this DISTINCT ON ripout? > > I haven't done anything about it, if that's what you mean. > > I didn't get any indication of whether anyone else agreed with me > (maybe the lack of loud complaints should be taken as consent?) I think so. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Tom, any status on this DISTINCT ON ripout? > > I haven't done anything about it, if that's what you mean. > > I didn't get any indication of whether anyone else agreed with me > (maybe the lack of loud complaints should be taken as consent?) I will wrap up the mail messages, and put it on the TODO list. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026