Thread: Top n queries and GROUP BY

Top n queries and GROUP BY

From
Rich Cullingford
Date:
All,
This is a straight SQL question, maybe not appropriate for a performance
list, but...

I have a simple stock holdings setup:

=> select * from t1;
  nam |    co     | num
-----+-----------+------
  joe | ibm       |  600
  abe | ibm       | 1500
  joe | cisco     | 1200
  abe | cisco     |  800
  joe | novell    |  500
  joe | microsoft |  200

What I would like to see is a Top-n-holdings-by-name", e.g, for n=2:

  nam      |  co    | num
----------+--------+-----
  joe      | cisco  |  1200
  joe      | ibm    |   600
  abe      | ibm    |  1500
  abe      | cisco  |   800

I can get part of the way by using a LIMIT clause in a subquery, e.g,

=> select 'abe', a.co, a.num from (select co, num from t1 where
nam='abe' order by num desc limit 2) as a;
  ?column? |  co   | num
----------+-------+------
  abe      | ibm   | 1500
  abe      | cisco |  800

but I can't figure out a correlated subquery (or GROUP BY arrangement or
anything else) that will cycle through the names. I vaguely remember
that these kinds or queries are hard to do in standard SQL, but I was
hoping that PG, with its extensions...

                   Thanks, Rich Cullingford
                           rculling@sysd.com



Re: Top n queries and GROUP BY

From
Rich Cullingford
Date:
Rich Cullingford wrote:
> All,
> This is a straight SQL question, maybe not appropriate for a performance
> list, but...
>
> I have a simple stock holdings setup:
>
> => select * from t1;
>  nam |    co     | num
> -----+-----------+------
>  joe | ibm       |  600
>  abe | ibm       | 1500
>  joe | cisco     | 1200
>  abe | cisco     |  800
>  joe | novell    |  500
>  joe | microsoft |  200
>
> What I would like to see is a Top-n-holdings-by-name", e.g, for n=2:
>
>  nam      |  co    | num
> ----------+--------+-----
>  joe      | cisco  |  1200
>  joe      | ibm    |   600
>  abe      | ibm    |  1500
>  abe      | cisco  |   800
>
> I can get part of the way by using a LIMIT clause in a subquery, e.g,
>
> => select 'abe', a.co, a.num from (select co, num from t1 where
> nam='abe' order by num desc limit 2) as a;
>  ?column? |  co   | num
> ----------+-------+------
>  abe      | ibm   | 1500
>  abe      | cisco |  800
>
> but I can't figure out a correlated subquery (or GROUP BY arrangement or
> anything else) that will cycle through the names. I vaguely remember
> that these kinds or queries are hard to do in standard SQL, but I was
> hoping that PG, with its extensions...

I forgot about row subqueries; for n=3, for example:

=> SELECT * FROM t1
      WHERE (nam,co,num) IN
                (SELECT nam,co,num FROM t1 b
                 where b.nam=t1.nam
                 order by num desc limit 3)
      order by nam, num desc;

  nam |   co   | num
-----+--------+------
  abe | ibm    | 1500
  abe | cisco  |  800
  joe | cisco  | 1200
  joe | ibm    |  600
  joe | novell |  500
(5 rows)

Seems to work...
                        Thanks all, Rich Cullingford
                                    rculling@sysd.com


Re: Top n queries and GROUP BY

From
Harald Fuchs
Date:
In article <3FB8F98D.1010707@sysd.com>,
Rich Cullingford <rculling@sysd.com> writes:

> All,
> This is a straight SQL question, maybe not appropriate for a performance
> list, but...

> I have a simple stock holdings setup:

> => select * from t1;
>   nam |    co     | num
> -----+-----------+------
>   joe | ibm       |  600
>   abe | ibm       | 1500
>   joe | cisco     | 1200
>   abe | cisco     |  800
>   joe | novell    |  500
>   joe | microsoft |  200

> What I would like to see is a Top-n-holdings-by-name", e.g, for n=2:

>   nam      |  co    | num
> ----------+--------+-----
>   joe      | cisco  |  1200
>   joe      | ibm    |   600
>   abe      | ibm    |  1500
>   abe      | cisco  |   800

> I can get part of the way by using a LIMIT clause in a subquery, e.g,

> => select 'abe', a.co, a.num from (select co, num from t1 where
> nam='abe' order by num desc limit 2) as a;
>   ?column? |  co   | num
> ----------+-------+------
>   abe      | ibm   | 1500
>   abe      | cisco |  800

> but I can't figure out a correlated subquery (or GROUP BY arrangement or
> anything else) that will cycle through the names. I vaguely remember
> that these kinds or queries are hard to do in standard SQL, but I was
> hoping that PG, with its extensions...

How about an outer join?

  SELECT x1.nam, x1.co, x1.num
  FROM t1 x1
  LEFT JOIN t1 x2 ON x2.nam = x1.nam AND x2.num > x1.num
  GROUP BY x1.nam, x1.co, x1.num
  HAVING count(*) < 2
  ORDER BY x1.nam, x1.num DESC