Re: Obtaining random rows from a result set - Mailing list pgsql-general

From Erik Jones
Subject Re: Obtaining random rows from a result set
Date
Msg-id 1F87CA98-2DFD-4551-812C-1B8F55506D09@myemma.com
Whole thread Raw
In response to Re: Obtaining random rows from a result set  (Kaloyan Iliev <kaloyan@digsys.bg>)
List pgsql-general
On Aug 31, 2007, at 8:34 AM, Kaloyan Iliev wrote:

> Alban Hertroys wrote:
>
>> Hello,
>>
>> I've recently been busy improving a query that yields a fixed
>> number of
>> random records matching certain conditions. I have tried all the
>> usual
>> approaches, and although they do work, they're all limited in some
>> way
>> and don't translate really well to what you "want". They're
>> kludges, IMHO.
>>
>> The methods I've tried are explained quite well on
>> http://people.planetpostgresql.org/greg/index.php?/archives/40-
>> Getting-random-rows-from-a-database-table.html
>>
>> All these methods involve calculating a random number for every
>> record
>> in the result set at some point in time, which is really not what I'm
>> trying to model. I think the database should provide some means to
>> get
>> those records, so...
>>
>> Dear Santa,
>>
>> I'd like my database to have functionality analogue to how LIMIT
>> works,
>> but for other - non-sequential - algorithms.
>>
>> I was thinking along the lines of:
>>
>>     SELECT *
>>       FROM table
>>      WHERE condition = true
>>      RANDOM 5;
>>
>> Which would (up to) return 5 random rows from the result set, just as
>> LIMIT 5 returns (up to) the first 5 records in the result set.
>>
>>
>> Or maybe even with a custom function, so that you could get non-
>> linear
>> distributions:
>>
>>     SELECT *
>>       FROM table
>>      WHERE condition = true
>>      LIMIT 5 USING my_func();
>>
>>     Where my_func() could be a user definable function accepting a
>> number
>> that should be (an estimate of?) the number of results being
>> returned so
>> that it can provide pointers to which rows in the resultset will be
>> returned from the query.
>>
>> Examples:
>> * random(maxrows) would return random rows from the resultset.
>> * median() would return the rows in the middle of the result set
>> (this
>> would require ordering to be meaningful).
>>
>> What do people think, is this feasable? Desirable? Necessary?
>>
>> If I'd have time I'd volunteer for at least looking into this, but
>> I'm
>> working on three projects simultaneously already. Alas...
>>
>> Regards,
>> Alban Hertroys.
>>
>>
> Hi,
> Why not generate a random number in your application and then:
>
> SELECT *
> FROM table_x
> WHERE condition = true
> OFFSET generated_random_number
> LIMIT xx
>
> Kaloyan Iliev
>

That won't work without some kind of a priori knowledge of how many
rows the query would return without the offset and limit.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



pgsql-general by date:

Previous
From: Erik Jones
Date:
Subject: Re: URGENT: Whole DB down ("no space left on device")
Next
From: "Marco Bizzarri"
Date:
Subject: computing and updating the size of a table with large objects