Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace - Mailing list pgsql-sql

From Chris Bitmead
Subject Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace
Date
Msg-id 388F7AE9.EDD10859@bitmead.com
Whole thread Raw
In response to Re: [SQL] DISTINCT ON: speak now or forever hold your peace  (Julian Scarfe <jscarfe@callnetuk.com>)
Responses Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom Lane wrote:
>      SELECT DISTINCT ON (expression) target1, ...
> One immediate advantage of allowing an expression is that you can do
> distinct-on-two-fields in a rather klugy way, eg
>                 SELECT DISTINCT ON (field1 || ' ' || field2) ...

As long as we're fixing the syntax, I'm wondering if it wouldn't be more
logical to have DISTINCT ON somewhere later in the syntax. I'm wondering
if that might also avoid the need for () as a side effect. Like this
perhaps....

SELECT x, y, z FROM foo WHERE z DISTINCT ON x, y ORDER BY x, y;

> What I'd prefer to do is put in an
> error check that says "if you use both DISTINCT ON and ORDER BY, then
> the DISTINCT ON expression must be the first ORDER BY item".  

Better, but still a little kludgy. What about a syntax that really
supports everything you want? Like maybe...

SELECT x, y, z FROM foo DISTINCT ON x, y DESC ORDER BY z ASC;

Distinct on now has a similar syntax to the order by clause. What this
means is, do the DISTINCT ON test by ordering DESC (so you get the
distinct item with the largest value of x, y), and then order the final
result by z ascending.

Unless I'm missing something that gives everybody what they want.

HANG ON, I've got a better idea.....

The other alternative is to make DISTINCT ON a bit like GROUP BY. So you
would have
something like

SELECT x, y, max(z) AS mmm FROM foo DISTINCT ON x, y ORDER BY mmm;

Like GROUP BY where you group by the non-aggregate fields in the SELECT
clause, you use DISTINCT ON, to mention the non-aggregate clauses in the
SELECT. The aggregate clause are used to select WHICH of the resulting
fields are selected for presentation. This would have the benefit of
being more general so you could select, say the city with the longest
name in each state...

SELECT state, zipcode, longest_string(cityname) FROM cities DISTINCT ON
state ORDER BY zipcode;

I don't know how far I'm stepping here from the SQL paradigm, but it
sure seems cool....

Chris.


pgsql-sql by date:

Previous
From: Philip Warner
Date:
Subject: Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace
Next
From: Iain.Mott@CMIS.CSIRO.AU
Date:
Subject: RE: [SQL] Duplicate tuples with unique index