Thread: Selecting random row

Selecting random row

From

"Michal Taborsky"

Date:

02 August 2002, 08:40:09

Hi everyone,

I am facing a performance problem with selecting a random row from a table.
I do it like this:

SELECT * FROM table ORDER BY random() LIMIT 1

But it seems that the random() function is quite slow and backend apparently
evaluates the function for all rows (which is logical). On a table with few
thousand rows it takes seconds. Does anybody know a better solution to
selecting one (or more) random rows from a table ?

Thanks for any ideas,
Michal

Re: Selecting random row

From

Arguile

Date:

02 August 2002, 09:26:42

On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote:
> I am facing a performance problem with selecting a random row from a table.
> I do it like this:
>
> SELECT * FROM table ORDER BY random() LIMIT 1
>
> But it seems that the random() function is quite slow and backend apparently
> evaluates the function for all rows (which is logical). On a table with few
> thousand rows it takes seconds. Does anybody know a better solution to
> selecting one (or more) random rows from a table ?

This isn't tested but I'd imagine it would work faster.

SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random()
LIMIT 1);

You still generate a random number for every record but only pull the
PKs into mem (and the one selected record) instead of the entire table.
From what I understand of the internals this should be quicker.

Other options include keeping a lookup table with a sequential (no
breaks) key so you can use some PL language to generate a number in that
range. If your key is almost sequential anyways adding an error check -
to make sure the record exists and if not try again - might be easier.

Re: Selecting random row

From

"Michal Taborsky"

Date:

02 August 2002, 10:37:13

This does decrease the the cost, but only very little (10500 -> 9800), which
is not the solution. That thing with sequential index might work for some
cases, that is true. Unfortunately not in this one, because I actually do
not select random row from a table, but from a complex select query
resultset. But thanks for the suggestions.

Michal

"Arguile" <arguile@lucentstudios.com> p��e v diskusn�m p��sp�vku
news:1028294789.352.8.camel@broadswd...
> On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote:
> > I am facing a performance problem with selecting a random row from a
table.
> > I do it like this:
> >
> > SELECT * FROM table ORDER BY random() LIMIT 1
> >
> > But it seems that the random() function is quite slow and backend
apparently
> > evaluates the function for all rows (which is logical). On a table with
few
> > thousand rows it takes seconds. Does anybody know a better solution to
> > selecting one (or more) random rows from a table ?
>
> This isn't tested but I'd imagine it would work faster.
>
> SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random()
> LIMIT 1);
>
> You still generate a random number for every record but only pull the
> PKs into mem (and the one selected record) instead of the entire table.
> From what I understand of the internals this should be quicker.
>
> Other options include keeping a lookup table with a sequential (no
> breaks) key so you can use some PL language to generate a number in that
> range. If your key is almost sequential anyways adding an error check -
> to make sure the record exists and if not try again - might be easier.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Selecting random row

From

Doug McNaught

Date:

02 August 2002, 12:20:10

"Michal Taborsky" <michal@taborsky.cz> writes:

> This does decrease the the cost, but only very little (10500 -> 9800), which
> is not the solution. That thing with sequential index might work for some
> cases, that is true. Unfortunately not in this one, because I actually do
> not select random row from a table, but from a complex select query
> resultset. But thanks for the suggestions.

Why not

SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE);

I don't know if this is the exact syntax but you get the idea...

-Doug

Re: Selecting random row

From

Michal Taborsky

Date:

02 August 2002, 14:49:06

Weeeell, we are getting close to something, that might work :) But the
problem with this solution is, that OFFSET refuses to take anything but
constant number (ie. 10). This might be overcome by first getting the count
in one query, then compute the random offest and build a new query with this
number as an OFFSET argument. It will probably be faster, but it is not too
clean  - i would have to use some outside scripting or create a pl/pgSQL
function for that. I have access to only 7.1.3 version at the moment, maybe
7.2 can handle this like you wrote ?

Michal

-----Original Message-----
From: Doug McNaught [mailto:doug@wireboard.com]
Sent: Friday, August 02, 2002 6:20 PM
To: Michal Taborsky
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Selecting random row

"Michal Taborsky" <michal@taborsky.cz> writes:

> This does decrease the the cost, but only very little (10500 -> 9800),
which
> is not the solution. That thing with sequential index might work for some
> cases, that is true. Unfortunately not in this one, because I actually do
> not select random row from a table, but from a complex select query
> resultset. But thanks for the suggestions.

Why not

SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE);

I don't know if this is the exact syntax but you get the idea...

-Doug