Thread: Ranking values within a query (pseudo-ROWNUM)

Ranking values within a query (pseudo-ROWNUM)

From
Jeff Boes
Date:
I hope this helps someone else ... I had struggled some time ago with
attempts to get a rank of values query to work, but then I gave up and
set it aside. I had another reason to attack it, and in between then and
now I learned how to return "setof" values from a function, as well as
how to construct "dynamic" queries inside a function.

Returning the top 10 values from a query is no big deal:

   create table my_table (field1 integer, field2 integer, field3 text);
   ...
   select * from my_table order by field1 limit 10;

If you want the top value for each value of field1, a DISTINCT works:

   select distinct on (field1) * from my_table
   order by field1, field2 desc;

(i.e., the row with the biggest field2 value for each set of rows with
the same field1 value).

However, if you want the top *N* values, it gets a lot trickier.
PostgreSQL lacks the concept of ROWNUM, which would let you filter in
just those rows in the top 5, for example.

Here's what I came up with; I've not applied this to more than a couple
of test cases, so there may very well be flaws in this approach.

create table rank_of_values(rank_of integer, the_value integer);

(The table is necessary so that our functions can return a rowtype.
Someday PostgreSQL may have a way to add a rowtype to the database
without a corresponding table.)

Now there are two functions, with the same name (and Pg isn't confused,
because they have different numbers of arguments).

The first function is a straightforward ranking of values. Given a
column name, and a "FROM" clause, it returns a set of rows with the
column value and rank (tied values do NOT have the same rank).

CREATE FUNCTION fn_rank_values(TEXT,TEXT)
RETURNS setof rank_of_values AS '
     DECLARE
         t RECORD;
         r rank_of_values%ROWTYPE;
         curs REFCURSOR;
         col ALIAS FOR $1;
         stmt ALIAS FOR $2;
         rank INTEGER;
     BEGIN
         OPEN curs FOR EXECUTE 'SELECT "' || col || '" AS
        "the_value" ' || stmt;
         rank := 1;
         LOOP
             FETCH curs INTO t;
             EXIT WHEN NOT FOUND;
             r.rank_of = rank;
             r.the_value = t.the_value;
             RETURN next r;
             rank := 1 + rank;
         END LOOP;
         CLOSE curs;
         RETURN;
     END;
' LANGUAGE 'plpgsql' STABLE;

The second function is the "rank-within-group" facility. Given a
value-column, a group-column, and a "FROM" clause, it returns the
original value, and its rank within the group formed by identical values
of the group-column.

CREATE FUNCTION fn_rank_values(TEXT,TEXT,TEXT)
RETURNS setof rank_of_values AS '
     DECLARE
         t RECORD;
         r rank_of_values%ROWTYPE;
         curs REFCURSOR;
         col ALIAS FOR $1;
         grp ALIAS FOR $2;
         clause ALIAS FOR $3;
         rank INTEGER;
         curr_grp INTEGER;
         stmt TEXT;
     BEGIN
         stmt := 'SELECT "' || col || '" AS "the_value", "' || grp
             || '" AS "the_group" ' || clause;
         OPEN curs FOR EXECUTE stmt;
         rank := 1;
         LOOP
             FETCH curs INTO t;
             EXIT WHEN NOT FOUND;
             IF curr_grp IS NULL
             THEN
                 curr_grp = t.the_group;
             ELSIF curr_grp != t.the_group
             THEN
                 curr_grp = t.the_group;
                 rank = 1;
             END IF;
             r.rank_of = rank;
             r.the_value = t.the_value;
             RETURN next r;
             rank := 1 + rank;
         END LOOP;
         CLOSE curs;
         RETURN;
     END;
' LANGUAGE 'plpgsql' STABLE;

Now you can do things like this:

select rank_of, the_value from fn_rank_values('field1','from mytable
order by field1') order by rank_of;

which will give you the rank number and original value of each row.

The second function is more interesting, because that's where you get
the ability to do "top N by X" kinds of queries. For instance,

select field1, field2, field3 from my_table
join fn_rank_values('field1','field2','from my_table order by field1,
field2') on (field1=the_value)
where rank_of <= 5 order by field2, rank_of;


--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
            ...Nexcerpt... Extend your Expertise