Re: Selecting random rows efficiently - Mailing list pgsql-performance

From Tom Lane
Subject Re: Selecting random rows efficiently
Date
Msg-id 25757.1062256444@sss.pgh.pa.us
Whole thread Raw
In response to Re: Selecting random rows efficiently  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I said:
> 3. Your query now looks like
>     SELECT * FROM table WHERE random_id >= random()
>     ORDER BY random_id LIMIT 1;

Correction: the above won't give quite the right query because random()
is marked as a volatile function.  You can hide the random() call inside
a user-defined function that you (misleadingly) mark stable, or you can
just stick it into a sub-select:

regression=# explain select * from foo WHERE random_id >= (select random())
regression-# ORDER BY random_id LIMIT 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=0.01..0.15 rows=1 width=8)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Index Scan using fooi on foo  (cost=0.00..45.50 rows=334 width=8)
         Index Cond: (random_id >= $0)
(5 rows)

This technique is probably safer against future planner changes,
however:

regression=# create function oneshot_random() returns float8 as
regression-# 'select random()' language sql stable;
CREATE FUNCTION
regression=# explain select * from foo WHERE random_id >= oneshot_random()
regression-# ORDER BY random_id LIMIT 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=0.00..0.14 rows=1 width=8)
   ->  Index Scan using fooi on foo  (cost=0.00..46.33 rows=334 width=8)
         Index Cond: (random_id >= oneshot_random())
(3 rows)

The point here is that an indexscan boundary condition has to use stable
or immutable functions.  By marking oneshot_random() stable, you
essentially say that it's okay to evaluate it only once per query,
rather than once at each row.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to force Nested Loop plan?
Next
From: "Matt Clark"
Date:
Subject: Re: Hardware recommendations to scale to silly load