Thread: Unique - first

Unique - first

From
Robert James
Date:
I have a table (x,y,z) - I'd like to take the rows with unique x
values - but, when more than one row have the same x value, I want the
one with the minimal z value.

How can I do that? I can imagine doing it with window functions, but
also that regular SQL should be able to do it too.


Re: Unique - first

From
Thomas Kellerer
Date:
Robert James wrote on 27.10.2013 14:04:
> I have a table (x,y,z) - I'd like to take the rows with unique x
> values - but, when more than one row have the same x value, I want the
> one with the minimal z value.
>
> How can I do that? I can imagine doing it with window functions, but
> also that regular SQL should be able to do it too.
>
>

Window functions *are* "regular" SQL ;)

select x,y,z
from (
   select x,y,z,
          min(y) over (partition by x) as min_y
   from the_table
) t
where y = min_y;

Instead of min() you could also use row_number() or dense_rank() to find the minimum value.

A solution without window functions could be something like:

select t1.x, t1.y, t1.z
from table t1
   join (select t2.x, min(t2.y) as min_y
         from the_table t2
         group by t2.x
   ) mt on mt.x = t1.x and mt.min_y = t1.y;

But I'm pretty sure the solution with the window function will perform better.





Re: Unique - first

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Robert James wrote on 27.10.2013 14:04:
>> I have a table (x,y,z) - I'd like to take the rows with unique x
>> values - but, when more than one row have the same x value, I want the
>> one with the minimal z value.
>>
>> How can I do that? I can imagine doing it with window functions, but
>> also that regular SQL should be able to do it too.

> Window functions *are* "regular" SQL ;)

Indeed.  The only other easy way I know of involves SELECT DISTINCT ON
(see the "weather reports" example in the SELECT reference page); but
that is most definitely not standard SQL, it is a Postgres-ism.

> A solution without window functions could be something like:

> select t1.x, t1.y, t1.z
> from table t1
>    join (select t2.x, min(t2.y) as min_y
>          from the_table t2
>          group by t2.x
>    ) mt on mt.x = t1.x and mt.min_y = t1.y;

Note that this doesn't work unless x and y form a primary key, else you
get multiple join rows (or no join rows, if one is NULL).  In any case,
it's unlikely to be fast.

I think I've seen some even more esoteric solutions that use only
SQL-92-era features, but lack of caffeine prevents me from recalling them.
In any case, there's a good reason why we invented SELECT DISTINCT ON:
this is just not easy to do in minimal SQL.

            regards, tom lane


Re: Unique - first

From
Rowan Collins
Date:
On 27/10/2013 13:04, Robert James wrote:
> I have a table (x,y,z) - I'd like to take the rows with unique x
> values - but, when more than one row have the same x value, I want the
> one with the minimal z value.
>
> How can I do that? I can imagine doing it with window functions, but
> also that regular SQL should be able to do it too.

My personal favourite approach, assuming you can rely on Postgres 9.0 or
higher, is to define a first() aggregate as shown here:
http://wiki.postgresql.org/wiki/First/last_%28aggregate%29

Once created, this can be used with the order_by_clause of the aggregate
expression as shown here:
http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES
(That's why it requires 9.0, since earlier versions had no such clause).

So in your case, you could run

SELECT
    x,
    first(y order by z) as y_with_lowest_z
FROM
    xyzzy
GROUP BY
    x

I find this a lot easier to understand than window functions; I've no
idea how its performance compares.

--
Rowan Collins
[IMSoP]



Re: Unique - first

From
Marcin Mańk
Date:

On Sun, Oct 27, 2013 at 2:04 PM, Robert James <srobertjames@gmail.com> wrote:
I have a table (x,y,z) - I'd like to take the rows with unique x
values - but, when more than one row have the same x value, I want the
one with the minimal z value.

You can use distinct on (which is a Postgresql extension to the SQL standard):

select distinct on(x) x, y, z
from the_table
order by x, z
Regards
Marcin Mańk