Thread: Selecting random row

Selecting random row

From
"Michal Taborsky"
Date:
Hi everyone,

I am facing a performance problem with selecting a random row from a table.
I do it like this:

SELECT * FROM table ORDER BY random() LIMIT 1

But it seems that the random() function is quite slow and backend apparently
evaluates the function for all rows (which is logical). On a table with few
thousand rows it takes seconds. Does anybody know a better solution to
selecting one (or more) random rows from a table ?

Thanks for any ideas,
Michal



Re: Selecting random row

From
Arguile
Date:
On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote:
> I am facing a performance problem with selecting a random row from a table.
> I do it like this:
>
> SELECT * FROM table ORDER BY random() LIMIT 1
>
> But it seems that the random() function is quite slow and backend apparently
> evaluates the function for all rows (which is logical). On a table with few
> thousand rows it takes seconds. Does anybody know a better solution to
> selecting one (or more) random rows from a table ?

This isn't tested but I'd imagine it would work faster.

SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random()
LIMIT 1);

You still generate a random number for every record but only pull the
PKs into mem (and the one selected record) instead of the entire table.
From what I understand of the internals this should be quicker.

Other options include keeping a lookup table with a sequential (no
breaks) key so you can use some PL language to generate a number in that
range. If your key is almost sequential anyways adding an error check -
to make sure the record exists and if not try again - might be easier.



Re: Selecting random row

From
"Michal Taborsky"
Date:
This does decrease the the cost, but only very little (10500 -> 9800), which
is not the solution. That thing with sequential index might work for some
cases, that is true. Unfortunately not in this one, because I actually do
not select random row from a table, but from a complex select query
resultset. But thanks for the suggestions.

Michal

"Arguile" <arguile@lucentstudios.com> p��e v diskusn�m p��sp�vku
news:1028294789.352.8.camel@broadswd...
> On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote:
> > I am facing a performance problem with selecting a random row from a
table.
> > I do it like this:
> >
> > SELECT * FROM table ORDER BY random() LIMIT 1
> >
> > But it seems that the random() function is quite slow and backend
apparently
> > evaluates the function for all rows (which is logical). On a table with
few
> > thousand rows it takes seconds. Does anybody know a better solution to
> > selecting one (or more) random rows from a table ?
>
> This isn't tested but I'd imagine it would work faster.
>
> SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random()
> LIMIT 1);
>
> You still generate a random number for every record but only pull the
> PKs into mem (and the one selected record) instead of the entire table.
> From what I understand of the internals this should be quicker.
>
> Other options include keeping a lookup table with a sequential (no
> breaks) key so you can use some PL language to generate a number in that
> range. If your key is almost sequential anyways adding an error check -
> to make sure the record exists and if not try again - might be easier.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org



Re: Selecting random row

From
Doug McNaught
Date:
"Michal Taborsky" <michal@taborsky.cz> writes:

> This does decrease the the cost, but only very little (10500 -> 9800), which
> is not the solution. That thing with sequential index might work for some
> cases, that is true. Unfortunately not in this one, because I actually do
> not select random row from a table, but from a complex select query
> resultset. But thanks for the suggestions.

Why not

SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE);

I don't know if this is the exact syntax but you get the idea...

-Doug

Re: Selecting random row

From
Michal Taborsky
Date:
Weeeell, we are getting close to something, that might work :) But the
problem with this solution is, that OFFSET refuses to take anything but
constant number (ie. 10). This might be overcome by first getting the count
in one query, then compute the random offest and build a new query with this
number as an OFFSET argument. It will probably be faster, but it is not too
clean  - i would have to use some outside scripting or create a pl/pgSQL
function for that. I have access to only 7.1.3 version at the moment, maybe
7.2 can handle this like you wrote ?

Michal

-----Original Message-----
From: Doug McNaught [mailto:doug@wireboard.com]
Sent: Friday, August 02, 2002 6:20 PM
To: Michal Taborsky
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Selecting random row


"Michal Taborsky" <michal@taborsky.cz> writes:

> This does decrease the the cost, but only very little (10500 -> 9800),
which
> is not the solution. That thing with sequential index might work for some
> cases, that is true. Unfortunately not in this one, because I actually do
> not select random row from a table, but from a complex select query
> resultset. But thanks for the suggestions.

Why not

SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE);

I don't know if this is the exact syntax but you get the idea...

-Doug