getting the ranks out of items with SHARED - Mailing list pgsql-general

From Janning Vygen
Subject getting the ranks out of items with SHARED
Date
Msg-id 200507121048.52113.vygen@gmx.de
Whole thread Raw
Responses Re: getting the ranks out of items with SHARED
List pgsql-general
Hi,

in postgresql you have several possibilites to get the rank of items. A thread
earlier this year shows correlated subqueries (not very performant) and other
tricks and techniques to solve the ranking problem:

http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php

The possibility to use a SHARED variable in plperl can be another nice way to
get a rank of items. ( good example showing SHARED in use is at
http://www.varlena.com/varlena/GeneralBits/114.php)

So i wrote two functions for my purpose:

CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$
  my %this;

  $this{'punkte'}  = shift;
  $this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0
  $this{'count'}   = $_SHARED{'prev'}{'count'} + 1; # defaults to 1

  $this{'ranking'} = $this{'count'} unless
    $this{'punkte'} == $_SHARED{'prev'}{'punkte'};

  $_SHARED{'prev'} = \%this;
  return $this{'ranking'};
$$;

CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS
$$
  $_SHARED{'prev'} = undef;
$$;


Nice Thing: the function drops rankings which other ranking solutions in the
given thread can't. Like this:

rank | points
-------------
1    | 10
2    |  9
2    |  9
4    |  8
5    |  7

It drops rank 3 because we have to entries for second rank.

It would be even nice if you can write a genric ranking() function which takes
anyarray as an argument, but as far as i know you can't pass an "anyarray" to
a plperl function, right?


Now i can do the following in plpsql Function which updates a caching table
for me and it works fine:

PERFORM reset_ranking();

CREATE TEMP TABLE ranking AS
SELECT
  *,
  ranking(r1.gc_gesamtpunkte) AS rank
FROM (
  SELECT
    mg_name,
    gc_gesamtpunkte
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
;

EXECUTE '
      UPDATE temp_gc
      SET gc_rank = ranking.rank
      FROM ranking
      WHERE temp_gc.mg_name = ranking.mg_name;
';

DROP TABLE ranking;


Problems arrise when you try to do the select and update step together without
any temporary table in between:

PERFORM reset_ranking();

UPDATE temp_gc SET gc_rank = ranking.rank
FROM (
  SELECT
    *,
    ranking(r1.gc_gesamtpunkte) AS rank
  FROM (
    SELECT
      mg_name,
      gc_gesamtpunkte
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, mg_name ASC
  ) AS r1
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

I have a guess, what happens here: The order of the subselect statement is
dropped by the optimizer because the optimizer doesn't see the "side-effect"
of the ranking function. that's ok because using such functions isn't SQLish,
i guess.

Is there a way to FORCE the optimizer to keep things orders like the sql
statement author wanted it?

kind regards,
janning


pgsql-general by date:

Previous
From: Harry Mantheakis
Date:
Subject: Re: Japanese words not distinguished
Next
From: Joe
Date:
Subject: Re: Converting MySQL tinyint to PostgreSQL