Home > mailing lists

Re: How pull - Mailing list pgsql-sql

From	Filip Rembiałkowski
Subject	Re: How pull
Date	September 23, 2007 22:15:55
Msg-id	92869e660709231815pe28db6cp6d1f132f16f2c181@mail.gmail.com Whole thread Raw
In response to	How pull ("Matt Magoffin" <postgresql.org@msqr.us>)
Responses	Re: How pull
List	pgsql-sql

Tree view

On 23/09/2007, Matt Magoffin <postgresql.org@msqr.us> wrote:
> Hello,
>
> My SQL skills are limited and I'm struggling with a query where I want to
> return a single item of an aggregate join. The query looks like this:
>
> select
>   (case
>     when agg.avg_rating is null then 0.0
>     when agg.avg_rating < 0.75 then 0.5
>     when agg.avg_rating < 1.25 then 1.0
>     when agg.avg_rating < 1.75 then 1.5
>     when agg.avg_rating < 2.25 then 2.0
>     when agg.avg_rating < 2.75 then 2.5
>     when agg.avg_rating < 3.25 then 3.0
>     when agg.avg_rating < 3.75 then 3.5
>     when agg.avg_rating < 4.25 then 4.0
>     when agg.avg_rating < 4.75 then 4.5
>     else 5.0
>   end) as avg_rating,
>   count(item.itemid) as item_count
> from media_item item
> inner join (
>   select rating.mediaitem_userrating_hjid as ritemid,
>     avg(rating.rating) as avg_rating
>   from media_item_rating rating, media_item item
>   where rating.mediaitem_userrating_hjid = item.itemid
>   group by rating.mediaitem_userrating_hjid
> ) as agg
> on item.itemid = agg.ritemid
> group by avg_rating
> order by avg_rating desc
>
> and a sample of results is this:
>
>  avg_rating | item_count
> ------------+------------
>         5.0 |         21
>         4.0 |         33
>         3.0 |         13
>         2.0 |          4
>         1.0 |          1
>
> What I want as well is the ID of the item (and possibly it's avg_rating
> value) from the "agg" join with the highest avg_rating for each output
> row... something like this
>
>  avg_rating | item_count | item_id | item_rating
> ------------+-----------------------------------
>         5.0 |         21 | 109890  | 4.9
>         4.0 |         33 | 89201   | 4.1
>         3.0 |         13 | 119029  | 2.8
>         2.0 |          4 | 182999  | 2.2
>         1.0 |          1 | 1929    | 1.0
>
> So the intention in this example is that item #109890 has an average
> rating of 4.9 and that is the highest rating within the > 4.75 rating
> group.
>
> If anyone had any tips I'd greatly appreciate it.
>

create ranking function to make queries look simpler:

create or replace function ranking_group(numeric) returns numeric as
$$ select case when $1 < 0.3456 then 'quite small' ...
end $$ language sql immutable;  (I'd make it STRICT, but you allow null rankings)

1st way: DISTINCT ON + subquery

select *, (select count(*) from rating where rating_group(rating) =
subq.rating_group ) as rating_group_size
from (
select distinct on (rating_group)   rating_group(r.rating),   r.item_id as best_rated_item_id,   r.rating as
best_rating
from rating r
order by rating_group desc, r.rating desc
) subq;


2nd way (faster - actually 2 x faster)
using FIRST aggregate to calculate all in one pass

create function first(numeric,numeric) returns numeric as 'select $1'
language sql immutable strict;
create function first(integer,integer) returns integer as 'select $1'
language sql immutable strict;
create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );

select rating_group(rating), count(*) as num_ratings, first(item_id) as best_rated_item_id, first(rating) as
best_rating
from ( select * from rating order by rating desc ) ordered_ratings
group by rating_group
order by rating_group desc;


note: if you can, get rid of null ratings. what are they supposed to
mean? they make things a bit more complicated.



--
Filip Rembiałkowski

pgsql-sql by date:

From: "Matt Magoffin"
Date: 23 September 2007, 19:46:07
Subject: How pull

From: Erik Jones
Date: 24 September 2007, 01:56:52
Subject: Re: Many databases

Re: How pull - Mailing list pgsql-sql

Previous

Next