I hope this helps someone else ... I had struggled some time ago with
attempts to get a rank of values query to work, but then I gave up and
set it aside. I had another reason to attack it, and in between then and
now I learned how to return "setof" values from a function, as well as
how to construct "dynamic" queries inside a function.
Returning the top 10 values from a query is no big deal:
create table my_table (field1 integer, field2 integer, field3 text);
...
select * from my_table order by field1 limit 10;
If you want the top value for each value of field1, a DISTINCT works:
select distinct on (field1) * from my_table
order by field1, field2 desc;
(i.e., the row with the biggest field2 value for each set of rows with
the same field1 value).
However, if you want the top *N* values, it gets a lot trickier.
PostgreSQL lacks the concept of ROWNUM, which would let you filter in
just those rows in the top 5, for example.
Here's what I came up with; I've not applied this to more than a couple
of test cases, so there may very well be flaws in this approach.
create table rank_of_values(rank_of integer, the_value integer);
(The table is necessary so that our functions can return a rowtype.
Someday PostgreSQL may have a way to add a rowtype to the database
without a corresponding table.)
Now there are two functions, with the same name (and Pg isn't confused,
because they have different numbers of arguments).
The first function is a straightforward ranking of values. Given a
column name, and a "FROM" clause, it returns a set of rows with the
column value and rank (tied values do NOT have the same rank).
CREATE FUNCTION fn_rank_values(TEXT,TEXT)
RETURNS setof rank_of_values AS '
DECLARE
t RECORD;
r rank_of_values%ROWTYPE;
curs REFCURSOR;
col ALIAS FOR $1;
stmt ALIAS FOR $2;
rank INTEGER;
BEGIN
OPEN curs FOR EXECUTE 'SELECT "' || col || '" AS
"the_value" ' || stmt;
rank := 1;
LOOP
FETCH curs INTO t;
EXIT WHEN NOT FOUND;
r.rank_of = rank;
r.the_value = t.the_value;
RETURN next r;
rank := 1 + rank;
END LOOP;
CLOSE curs;
RETURN;
END;
' LANGUAGE 'plpgsql' STABLE;
The second function is the "rank-within-group" facility. Given a
value-column, a group-column, and a "FROM" clause, it returns the
original value, and its rank within the group formed by identical values
of the group-column.
CREATE FUNCTION fn_rank_values(TEXT,TEXT,TEXT)
RETURNS setof rank_of_values AS '
DECLARE
t RECORD;
r rank_of_values%ROWTYPE;
curs REFCURSOR;
col ALIAS FOR $1;
grp ALIAS FOR $2;
clause ALIAS FOR $3;
rank INTEGER;
curr_grp INTEGER;
stmt TEXT;
BEGIN
stmt := 'SELECT "' || col || '" AS "the_value", "' || grp
|| '" AS "the_group" ' || clause;
OPEN curs FOR EXECUTE stmt;
rank := 1;
LOOP
FETCH curs INTO t;
EXIT WHEN NOT FOUND;
IF curr_grp IS NULL
THEN
curr_grp = t.the_group;
ELSIF curr_grp != t.the_group
THEN
curr_grp = t.the_group;
rank = 1;
END IF;
r.rank_of = rank;
r.the_value = t.the_value;
RETURN next r;
rank := 1 + rank;
END LOOP;
CLOSE curs;
RETURN;
END;
' LANGUAGE 'plpgsql' STABLE;
Now you can do things like this:
select rank_of, the_value from fn_rank_values('field1','from mytable
order by field1') order by rank_of;
which will give you the rank number and original value of each row.
The second function is more interesting, because that's where you get
the ability to do "top N by X" kinds of queries. For instance,
select field1, field2, field3 from my_table
join fn_rank_values('field1','field2','from my_table order by field1,
field2') on (field1=the_value)
where rank_of <= 5 order by field2, rank_of;
--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise