Ranking values within a query (pseudo-ROWNUM) - Mailing list pgsql-general
From | Jeff Boes |
---|---|
Subject | Ranking values within a query (pseudo-ROWNUM) |
Date | |
Msg-id | 40DB1BB6.4090302@nexcerpt.com Whole thread Raw |
List | pgsql-general |
I hope this helps someone else ... I had struggled some time ago with attempts to get a rank of values query to work, but then I gave up and set it aside. I had another reason to attack it, and in between then and now I learned how to return "setof" values from a function, as well as how to construct "dynamic" queries inside a function. Returning the top 10 values from a query is no big deal: create table my_table (field1 integer, field2 integer, field3 text); ... select * from my_table order by field1 limit 10; If you want the top value for each value of field1, a DISTINCT works: select distinct on (field1) * from my_table order by field1, field2 desc; (i.e., the row with the biggest field2 value for each set of rows with the same field1 value). However, if you want the top *N* values, it gets a lot trickier. PostgreSQL lacks the concept of ROWNUM, which would let you filter in just those rows in the top 5, for example. Here's what I came up with; I've not applied this to more than a couple of test cases, so there may very well be flaws in this approach. create table rank_of_values(rank_of integer, the_value integer); (The table is necessary so that our functions can return a rowtype. Someday PostgreSQL may have a way to add a rowtype to the database without a corresponding table.) Now there are two functions, with the same name (and Pg isn't confused, because they have different numbers of arguments). The first function is a straightforward ranking of values. Given a column name, and a "FROM" clause, it returns a set of rows with the column value and rank (tied values do NOT have the same rank). CREATE FUNCTION fn_rank_values(TEXT,TEXT) RETURNS setof rank_of_values AS ' DECLARE t RECORD; r rank_of_values%ROWTYPE; curs REFCURSOR; col ALIAS FOR $1; stmt ALIAS FOR $2; rank INTEGER; BEGIN OPEN curs FOR EXECUTE 'SELECT "' || col || '" AS "the_value" ' || stmt; rank := 1; LOOP FETCH curs INTO t; EXIT WHEN NOT FOUND; r.rank_of = rank; r.the_value = t.the_value; RETURN next r; rank := 1 + rank; END LOOP; CLOSE curs; RETURN; END; ' LANGUAGE 'plpgsql' STABLE; The second function is the "rank-within-group" facility. Given a value-column, a group-column, and a "FROM" clause, it returns the original value, and its rank within the group formed by identical values of the group-column. CREATE FUNCTION fn_rank_values(TEXT,TEXT,TEXT) RETURNS setof rank_of_values AS ' DECLARE t RECORD; r rank_of_values%ROWTYPE; curs REFCURSOR; col ALIAS FOR $1; grp ALIAS FOR $2; clause ALIAS FOR $3; rank INTEGER; curr_grp INTEGER; stmt TEXT; BEGIN stmt := 'SELECT "' || col || '" AS "the_value", "' || grp || '" AS "the_group" ' || clause; OPEN curs FOR EXECUTE stmt; rank := 1; LOOP FETCH curs INTO t; EXIT WHEN NOT FOUND; IF curr_grp IS NULL THEN curr_grp = t.the_group; ELSIF curr_grp != t.the_group THEN curr_grp = t.the_group; rank = 1; END IF; r.rank_of = rank; r.the_value = t.the_value; RETURN next r; rank := 1 + rank; END LOOP; CLOSE curs; RETURN; END; ' LANGUAGE 'plpgsql' STABLE; Now you can do things like this: select rank_of, the_value from fn_rank_values('field1','from mytable order by field1') order by rank_of; which will give you the rank number and original value of each row. The second function is more interesting, because that's where you get the ability to do "top N by X" kinds of queries. For instance, select field1, field2, field3 from my_table join fn_rank_values('field1','field2','from my_table order by field1, field2') on (field1=the_value) where rank_of <= 5 order by field2, rank_of; -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
pgsql-general by date: