Home > mailing lists

Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace - Mailing list pgsql-sql

From	Chris Bitmead
Subject	Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace
Date	January 26, 2000 20:57:14
Msg-id	388F7AE9.EDD10859@bitmead.com Whole thread Raw
In response to	Re: [SQL] DISTINCT ON: speak now or forever hold your peace (Julian Scarfe <jscarfe@callnetuk.com>)
Responses	Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace
List	pgsql-sql

Tree view

Tom Lane wrote:
>      SELECT DISTINCT ON (expression) target1, ...
> One immediate advantage of allowing an expression is that you can do
> distinct-on-two-fields in a rather klugy way, eg
>                 SELECT DISTINCT ON (field1 || ' ' || field2) ...

As long as we're fixing the syntax, I'm wondering if it wouldn't be more
logical to have DISTINCT ON somewhere later in the syntax. I'm wondering
if that might also avoid the need for () as a side effect. Like this
perhaps....

SELECT x, y, z FROM foo WHERE z DISTINCT ON x, y ORDER BY x, y;

> What I'd prefer to do is put in an
> error check that says "if you use both DISTINCT ON and ORDER BY, then
> the DISTINCT ON expression must be the first ORDER BY item".  

Better, but still a little kludgy. What about a syntax that really
supports everything you want? Like maybe...

SELECT x, y, z FROM foo DISTINCT ON x, y DESC ORDER BY z ASC;

Distinct on now has a similar syntax to the order by clause. What this
means is, do the DISTINCT ON test by ordering DESC (so you get the
distinct item with the largest value of x, y), and then order the final
result by z ascending.

Unless I'm missing something that gives everybody what they want.

HANG ON, I've got a better idea.....

The other alternative is to make DISTINCT ON a bit like GROUP BY. So you
would have
something like

SELECT x, y, max(z) AS mmm FROM foo DISTINCT ON x, y ORDER BY mmm;

Like GROUP BY where you group by the non-aggregate fields in the SELECT
clause, you use DISTINCT ON, to mention the non-aggregate clauses in the
SELECT. The aggregate clause are used to select WHICH of the resulting
fields are selected for presentation. This would have the benefit of
being more general so you could select, say the city with the longest
name in each state...

SELECT state, zipcode, longest_string(cityname) FROM cities DISTINCT ON
state ORDER BY zipcode;

I don't know how far I'm stepping here from the SQL paradigm, but it
sure seems cool....

Chris.

pgsql-sql by date:

From: Philip Warner
Date: 26 January 2000, 20:24:12
Subject: Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace

From: Iain.Mott@CMIS.CSIRO.AU
Date: 26 January 2000, 21:02:14
Subject: RE: [SQL] Duplicate tuples with unique index

Re: [HACKERS] Re: [SQL] DISTINCT ON: speak now or forever hold your peace - Mailing list pgsql-sql

Previous

Next