Re: distinct - Mailing list pgsql-general

From Tom Lane
Subject Re: distinct
Date
Msg-id 17265.971146705@sss.pgh.pa.us
Whole thread Raw
In response to Re: distinct  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-general
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> There really isn't a spec way to do it, however,
>  select distinct on (col1) col1, col2, col3, col4 from tablename;
> would get you one row for each distinct col1, but it's undefined what
> row that gets you.

You can get a well-defined result with DISTINCT ON if you sort by
*all* the columns, because we specify that DISTINCT ON yields the
first row of each group that are the same in the DISTINCT-ON'd columns.
So for example,

select distinct on (col1, col2) col1, col2, col3, col4 from tablename
order by col1, col2, col3, col4 desc;

would give you the minimum col3, and then the maximum col4 for that col3
value, within the rows having any particular col1 & col2 values.

This is a big kluge, no question, but there are cases where it gives
you just what you want and there's no good way to get the same result
in standard SQL.  For example, the above query is NOT equivalent to

select col1, col2, min(col3), max(col4) from tablename
group by col1, col2;

That'll give you the same col1, col2, and col3 results --- but the col4
maxima will be across the whole of each col1/col2 group, and will not
necessarily come from the same rows the col3 values come from.

There was a nice example of an application where this was Just The Right
Thing in the discussions that led up to DISTINCT ON being cleaned up,
rather than axed, for 7.0.  It was either pgsql-general or pgsql-sql,
I forget, but check the archives for discussion of DISTINCT ON between
6.5 and 7.0 release dates.

            regards, tom lane

pgsql-general by date:

Previous
From: "Ruslan P. Yawdoshak"
Date:
Subject: Question !
Next
From: Neil Conway
Date:
Subject: Re: Postgres 7.1 and the 8k tuple / row limit