Re: Obtaining random rows from a result set - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Obtaining random rows from a result set
Date
Msg-id 46DD7CBF.7040609@magproductions.nl
Whole thread Raw
In response to Re: Obtaining random rows from a result set  (Alban Hertroys <alban@magproductions.nl>)
Responses Re: Obtaining random rows from a result set
List pgsql-general
To follow up on my own post, I came up with a workable solution based on
scrolling cursors. The SP approach didn't work out for me, I didn't
manage to declare a cursor in PL/pgSQL that could be positioned
absolutely (maybe that's due to us still using PG 8.1.something?).

A solution to that would be appreciated.

Anyway, I solved the problem in our application (PHP). I even got a
workable solution to prevent returning the same record more than once.
Here goes:

function randomSet($query, $limit, $uniqueColumn) {

    // queries; depends on your DB connector
    DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query;
    MOVE FORWARD ALL IN _cur;

    //GET DIAGNOSTICS _count := ROW_COUNT;
    $count = pg_affected_rows();

    $uniques = array();
    $resultSet = array();
    while ($limit > 0 && count($uniques) < $count) {
        $idx = random(1, $count);

        //query
        $record = FETCH ABSOLUTE $idx FROM _cur;

        // Skip records with a column value we want to be unique
        if (in_array($record[$uniqueColumn], $uniques)
            continue;

        $uniques[] = $record[$uniqueColumn];
        $resultSet[] = $record;
        $limit--;
    }

    // query
    CLOSE _cur;

    return $resultSet;
}

I hope this is useful to anyone. It worked for us; it is definitely
faster than order by random(), and more random than precalculated column
values. Plus it translates directly to what we are requesting :)

Alban Hertroys wrote:
> I thought of another solution (with only a few calculations of random())
> that can be deployed in existing versions of PG, using a set-returning
> function with a scrolling cursor that accepts the query string as input
> like this (in pseudoish-code):
>
> ----
> create function random(text _query, integer _limit)
> returns set
> volatile
> as $$
> DECLARE
>     _cur cursor;
>     _cnt bigint;
>     _idx integer;
>     _rowpos bigint;
>
>     _rec record;
> BEGIN
>     open _cur for execute query;
>     fetch forward all into _rec;
>     -- select total nr of records into _cnt
>
>     for _idx in 1.._limit loop
>         _rowpos := random() * _cnt;
>
>         fetch absolute _rowpos into _rec;
>         return next _rec;
>     end loop;
>
>     return;
> END;
> $$
> language 'plpgsql';
> ----

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Statistics collection question
Next
From: Alban Hertroys
Date:
Subject: Re: Obtaining random rows from a result set