Re: ORDER BY random() LIMIT 1 slowness - Mailing list pgsql-general

From Jean-Luc Lachance
Subject Re: ORDER BY random() LIMIT 1 slowness
Date
Msg-id 3DFF4B15.C8B3A8B9@nsd.ca
Whole thread Raw
In response to ORDER BY random() LIMIT 1 slowness  ("Gavin M. Roy" <gmr@justsportsusa.com>)
List pgsql-general
Gavin,

Assuming that you have a serial column rand on poetry and you did not
delete any row,
here is my suggestion:

CREATE TABLE poetry ( rand SERIAL, ... );

SELECT * FROM poetry WHERE rand = (
  SELECT int8( curval( 'poetry_rand_seq') * random()));


JLL


Tom Lane wrote:
>
> "Gavin M. Roy" <gmr@justsportsusa.com> writes:
> > SELECT * FROM poetry ORDER BY random() LIMIT 1;
> > [ is slow for 35000 rows ]
>
> Yeah.  Basically this query is implemented as
>   (a) select all 35000 rows of "poetry";
>   (b) compute a random() value for each row;
>   (c) sort by the random() values;
>   (d) take the first row, discard the rest.
>
> The good news: this gives you a pretty-durn-random selection.  The bad
> news: you didn't really care about choosing a random ordering of the
> other 34999 rows, but it computed one anyway.
>
> This problem's been discussed before, but I've not seen any really
> ideal answer.  SQL is not designed to offer unpredictable results ;-)
>
> If you have a reasonably-uniformly-distributed index key on the rows,
> you can try something like
>
>         select * from poetry where key > (scale * random() + min)
>         order by key limit 1;
>
> (where scale and min are chosen to make the result cover the range of
> index keys).  Given an index on "key" this should pick a result quickly
> via an indexscan+limit plan.  However, if the distribution of keys isn't
> real uniform then you won't get real random results --- keys just above
> larger gaps will be selected with unfairly high probability.  You also
> have to know the min and max keys accurately.
>
> I recall some folks have suggested generating an actual random column
> (ie, something initialized with "default random()" and suitably indexed)
> but this seems like a lot of overhead ... you'd have to be mighty
> concerned with the exactness of your random selection to put up with
> that, I think.
>
> There are some other suggestions in the archives, IIRC.
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

pgsql-general by date:

Previous
From: "Shridhar Daithankar"
Date:
Subject: Re: Server testing.
Next
From: Neil Conway
Date:
Subject: Re: 7.3 Prepared statements