Thread: Selecting random row
Hi everyone, I am facing a performance problem with selecting a random row from a table. I do it like this: SELECT * FROM table ORDER BY random() LIMIT 1 But it seems that the random() function is quite slow and backend apparently evaluates the function for all rows (which is logical). On a table with few thousand rows it takes seconds. Does anybody know a better solution to selecting one (or more) random rows from a table ? Thanks for any ideas, Michal
On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote: > I am facing a performance problem with selecting a random row from a table. > I do it like this: > > SELECT * FROM table ORDER BY random() LIMIT 1 > > But it seems that the random() function is quite slow and backend apparently > evaluates the function for all rows (which is logical). On a table with few > thousand rows it takes seconds. Does anybody know a better solution to > selecting one (or more) random rows from a table ? This isn't tested but I'd imagine it would work faster. SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random() LIMIT 1); You still generate a random number for every record but only pull the PKs into mem (and the one selected record) instead of the entire table. From what I understand of the internals this should be quicker. Other options include keeping a lookup table with a sequential (no breaks) key so you can use some PL language to generate a number in that range. If your key is almost sequential anyways adding an error check - to make sure the record exists and if not try again - might be easier.
This does decrease the the cost, but only very little (10500 -> 9800), which is not the solution. That thing with sequential index might work for some cases, that is true. Unfortunately not in this one, because I actually do not select random row from a table, but from a complex select query resultset. But thanks for the suggestions. Michal "Arguile" <arguile@lucentstudios.com> p��e v diskusn�m p��sp�vku news:1028294789.352.8.camel@broadswd... > On Fri, 2002-08-02 at 08:38, Michal Taborsky wrote: > > I am facing a performance problem with selecting a random row from a table. > > I do it like this: > > > > SELECT * FROM table ORDER BY random() LIMIT 1 > > > > But it seems that the random() function is quite slow and backend apparently > > evaluates the function for all rows (which is logical). On a table with few > > thousand rows it takes seconds. Does anybody know a better solution to > > selecting one (or more) random rows from a table ? > > This isn't tested but I'd imagine it would work faster. > > SELECT * FROM table WHERE id = (SELECT id FROM table ORDER BY random() > LIMIT 1); > > You still generate a random number for every record but only pull the > PKs into mem (and the one selected record) instead of the entire table. > From what I understand of the internals this should be quicker. > > Other options include keeping a lookup table with a sequential (no > breaks) key so you can use some PL language to generate a number in that > range. If your key is almost sequential anyways adding an error check - > to make sure the record exists and if not try again - might be easier. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
"Michal Taborsky" <michal@taborsky.cz> writes: > This does decrease the the cost, but only very little (10500 -> 9800), which > is not the solution. That thing with sequential index might work for some > cases, that is true. Unfortunately not in this one, because I actually do > not select random row from a table, but from a complex select query > resultset. But thanks for the suggestions. Why not SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE); I don't know if this is the exact syntax but you get the idea... -Doug
Weeeell, we are getting close to something, that might work :) But the problem with this solution is, that OFFSET refuses to take anything but constant number (ie. 10). This might be overcome by first getting the count in one query, then compute the random offest and build a new query with this number as an OFFSET argument. It will probably be faster, but it is not too clean - i would have to use some outside scripting or create a pl/pgSQL function for that. I have access to only 7.1.3 version at the moment, maybe 7.2 can handle this like you wrote ? Michal -----Original Message----- From: Doug McNaught [mailto:doug@wireboard.com] Sent: Friday, August 02, 2002 6:20 PM To: Michal Taborsky Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Selecting random row "Michal Taborsky" <michal@taborsky.cz> writes: > This does decrease the the cost, but only very little (10500 -> 9800), which > is not the solution. That thing with sequential index might work for some > cases, that is true. Unfortunately not in this one, because I actually do > not select random row from a table, but from a complex select query > resultset. But thanks for the suggestions. Why not SELECT * FROM table LIMIT 1 OFFSET random(SELECT count(*) FROM TABLE); I don't know if this is the exact syntax but you get the idea... -Doug