Fw: Selecting random rows using weights - Mailing list pgsql-novice

From Masaru Sugawara
Subject Fw: Selecting random rows using weights
Date
Msg-id 20020519132333.6BB4.RK73@sea.plala.or.jp
Whole thread Raw
List pgsql-novice
On Sun, 12 May 2002 13:24:17 -0600
"G" <gerard@interfold.com> wrote:

>
> Is there a way to select rows randomly using weight columns? (weighted
> random values)
>

I find out how to get what you want at random; however it's for only a row.
If you need ROWS, it's impossible to select them by using these functions
at least. Probably, another approach will be need.


-- First --
-- An id column must be unique.
create table tbl_random(id int4 unique not null, weight int4);
insert into tbl_random values(1, 1);
insert into tbl_random values(2, 10);
insert into tbl_random values(3, 0);
insert into tbl_random values(4, 3);
ALTER TABLE tbl_random ADD COLUMN r_start int4;
ALTER TABLE tbl_random ADD COLUMN r_end   int4;
CREATE INDEX idx_random_r_start ON tbl_random(r_start);
CREATE INDEX idx_random_r_end   ON tbl_random(r_end);

-- Second --
CREATE OR REPLACE FUNCTION fn_update_random() RETURNS boolean AS '
   DECLARE
        rec RECORD;
        range int4 :=1;
   BEGIN
        FOR rec IN SELECT * FROM tbl_random WHERE weight > 0 LOOP
           UPDATE tbl_random SET r_start = range,
                     r_end = weight + range - 1
                      WHERE id = rec.id;
           SELECT INTO range (r_end + 1) FROM tbl_random
                      WHERE id = rec.id;
        END LOOP;
        FOR rec IN SELECT * FROM tbl_random WHERE weight = 0 LOOP
           UPDATE tbl_random SET r_start = 0, r_end = 0
                     WHERE id = rec.id;
        END LOOP;
   RETURN true;
   END;
' LANGUAGE 'plpgsql';

-- Third  --
-- This query for updating the weighted range of r_start to r_end
-- needs to execute after rows are inserted into a target table,
-- ones are deleted from it, or the "weight" column  in it is updated.
SELECT fn_update_random();

-- 4th --
-- A random number(0 to 1) is scaled by the maximum number of r_end, which
-- must be put into the subselect not to internally execute the random()
-- function twice. And if its scaled number is within the weighted range,
-- the unique row having the range will be selected.
SELECT t1.*, t2.r
   FROM tbl_random AS t1,
        (SELECT ceil(max(r_end) * (SELECT random())) AS r
            FROM tbl_random) AS t2
   WHERE t1.r_start <= t2.r AND t2.r <= t1.r_end;

-- Here is a bad example.
SELECT t1.*
   FROM tbl_random AS t1
   WHERE (SELECT ceil(max(r_end) * (SELECT random())) FROM tbl_random)
          BETWEEN t1.r_start AND t1.r_end;


Regards,
Masaru Sugawara



pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: ADD CONSTRAINT NOT NULL, how?
Next
From: thiemo
Date:
Subject: Forgotten the master password of db