Re: hpw to Count without group by - Mailing list pgsql-general

From Edmund Bacon
Subject Re: hpw to Count without group by
Date
Msg-id m3zmu9sne3.fsf@elb-lx.onesystem.ca
Whole thread Raw
In response to hpw to Count without group by  (Yudie Pg <yudiepg@gmail.com>)
Responses Re: hpw to Count without group by
List pgsql-general
yudiepg@gmail.com (Yudie Pg) writes:

> Hello,
> I have a table, structure like this:
>  create table product(
>  sku, int4 not null,
>  category int4 null,
>  display_name varchar(100) null,
>  rank int4 null
> )
>  let say example data:
> sku, category, display_name
> =======================
> 10001, 5, postgresql, 132
> 10002, 5, mysql, 243
> 10003, 5, oracle, 323
> 10006, 7, photoshop, 53
> 10007, 7, flash mx, 88
> 10008, 9, Windows XP, 44
>  10008, 9, Linux, 74
>  Expected query result:
>  sku, category, display_name, category_count
> ====================================
> 10001, 5, postgresql, 3
>  10006, 7, photoshop, 2
>  10008, 9, Windows XP, 2
>  The idea is getting getting highest ranking each product category and COUNT
> how many products in the category with SINGLE query.
>  the first 3 columns can be done with select distinct on (category) ...
> order by category, rank desc but it still missing the category_count. I wish
> no subquery needed for having simplest query plan.
>   Thank you.
>   Yudie G.

I do not believe you can do this without a subquery - you are trying
to get 2 separate pieces of information from your data
   * some data about the record having MAX(rank) for each category
and
   * the count of records in each category

Note, however that you can get MAX(rank) and COUNT(category) in one
sequential pass of the data: e.g
 SELECT category, MAX(rank), COUNT(category) FROM product;

Joining this with the orignal table is not too dificult :

SELECT sku, category, display_name, category_count
  FROM  product
  JOIN (SELECT category, MAX(rank) AS rank, COUNT(category) AS category_count
               FROM product
               GROUP BY category) subq
        USING(category, rank)
 ORDER BY sku;

Depending on what your data looks like, you might improve things by
having an index on category, and perhaps on (category, rank).

Note that there is may be a problem with this query: If you have more
than one product with the same rank in the same category, you may get
more than one record for that category.  Apply distinct on as
neccessary.

--
Remove -42 for email

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: index row size 2728 exceeds btree maximum, 2713
Next
From: Terry Lee Tucker
Date:
Subject: Re: Determining when a row was inserted