Home > mailing lists

Re: ORDER BY and DISTINCT ON - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: ORDER BY and DISTINCT ON
Date	December 15, 2003 13:53:51
Msg-id	12581.1071500003@sss.pgh.pa.us Whole thread Raw
In response to	Re: ORDER BY and DISTINCT ON (Bruno Wolff III <bruno@wolff.to>)
List	pgsql-hackers

Tree view

Bruno Wolff III <bruno@wolff.to> writes:
> Specifically the interpretation I think makes sense is that
> SELECT DISTINCT ON (a, b, c) * FROM tablename ORDER BY d, e, f
> should be treated as equvialent to
> SELECT * FROM
>   (SELECT DISTINCT ON (a, b, c) FROM tablename ORDER BY a, b, c, d, e, f) AS t
>   ORDER BY d, e, f

Right, that's the same thing Neil said elsewhere in the thread.  I guess
I agree that this would be cleaner, and also that it doesn't seem like a
high priority problem.

BTW, you could imagine implementing DISTINCT and DISTINCT ON by a hash
table, if you didn't expect too many distinct rows.  The hash key would
be the DISTINCT columns, and in the DISTINCT ON case, when you get a new
row with DISTINCT columns matching an existing entry, you compare the
two rows using the ORDER BY key columns to decide which row to keep.  In
this formulation it's fairly obvious that the DISTINCT and ORDER BY keys
don't need to overlap at all.  This technique does not require presorted
input (good) but also delivers unsorted output (bad).  It could still be
a big win if you expect the DISTINCT filtering to reduce the number of
rows substantially, because you'd end up sorting a much smaller number
of rows.  I didn't get around to trying to implement this when I was
doing hash aggregation for 7.4, but it seems like a reasonable item for
the TODO list.
        regards, tom lane

pgsql-hackers by date:

From: Bruno Wolff III
Date: 15 December 2003, 13:04:17
Subject: Re: ORDER BY and DISTINCT ON

From: Andrew Dunstan
Date: 15 December 2003, 17:23:44
Subject: Re: [PATCHES] fork/exec patch

Re: ORDER BY and DISTINCT ON - Mailing list pgsql-hackers

Previous

Next