Thread: selecting random row values in postgres

selecting random row values in postgres

From
Sumeet
Date:
Hi all,<br /><br />I'm trying to write  a query to select random values from a set of 'GROUP BY'<br />....see the
scenariobelow to understand the problem here (the actual problem cannot be discussed here so i'm taking an example
scenario)<br /><br />Assume there is a table<br /><br />id | name | year_of_birth<br /><br />query: I want  to select
foreach year_of_birth a random name.<br /><br />--> so i do a group by year_of_birth, now i have a set of names, is
there any function to select just one name from these set of names. <br />The current approach i'm using to solve this
problemis <br /><br />1) getting these names in a single string using a custom function 'group_concat'<br />2) Convert
thesingle string into an array <br />3) use postgresql random function to generate  a random number <br />4) us the
randomnumber to select a element from the array previously created.<br /><br clear="all" />The solution is there but
it'skinda hack, is there any other better way of solving this problem.<br /><br /><br />Thanks, <br />Sumeet  

Re: selecting random row values in postgres

From
"Rajesh Kumar Mallah"
Date:
<br /><br />On 2/24/07, Sumeet <<a href="mailto:asumeet@gmail.com">asumeet@gmail.com</a>> wrote:<br />> Hi
all,<br/>> <br />> I'm trying to write  a query to select random values from a set of 'GROUP<br />> BY' <br
/>>....see the scenario below to understand the problem here (the actual<br />> problem cannot be discussed here
soi'm taking an example scenario) <br />> <br />> Assume there is a table<br />> <br />> id | name |
year_of_birth<br />> <br />> query: I want  to select for each year_of_birth a random name.<br /><br />Dear
Sumeet<br/><br />postgresql DISTINCT ON may be of help , but its not standard sql.<br /><br />regds<br />mallah.<br
/><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">tradein_clients=>SELECT * from temp.test;</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;">+----+------+-----+</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">| id | name | yob |</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;"> +----+------+-----+</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">|  1 | A    |   2
|</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;"> |  2 |
B    |  2 |</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">|  3| C    |   2 |</span><br style="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;"> |  4 | D    |   1 |</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">|  5 | E    |   1 |</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;"> |  6 | F    |   1 |</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">+----+------+-----+</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;"> (6 rows)</span><br style="font-family:
couriernew,monospace;" /><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">tradein_clients=>SELECT distinct on (yob) id,name,yob  from temp.test order by
yob,random();</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">+----+------+-----+</span><brstyle="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;"> | id | name | yob |</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">+----+------+-----+</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;"> |  5 | E    |   1 |</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">|  1 | A    |   2 |</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;"> +----+------+-----+</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">(2 rows)</span><br
style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;">tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.test
orderby yob,random();</span><br style="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">+----+------+-----+</span><br style="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;">| id | name | yob |</span><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;"> +----+------+-----+</span><br style="font-family: courier new,monospace;"
/><spanstyle="font-family: courier new,monospace;">|  4 | D    |   1 |</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;"> |  1 | A    |   2 |</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">+----+------+-----+</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;"> (2 rows)</span><br
style="font-family:courier new,monospace;" /><br style="font-family: courier new,monospace;" /><br style="font-family:
couriernew,monospace;" /><br style="font-family: courier new,monospace;" /><br />> <br />> --> so i do a group
byyear_of_birth, now i have a set of names, is there <br />> any function to select just one name from these set of
names.<br />> The current approach i'm using to solve this problem is <br />> <br />> 1) getting these names
ina single string using a custom function <br />> 'group_concat'<br />> 2) Convert the single string into an
array<br />> 3) use postgresql random function to generate  a random number <br />> 4) us the random number to
selecta element from the array previously <br />> created.<br />> <br />> The solution is there but it's kinda
hack,is there any other better way of<br />> solving this problem.<br />> <br />> <br />> Thanks, <br
/>>Sumeet <br /> 

Re: selecting random row values in postgres

From
Sumeet
Date:
Thanks Buddy, really appreciate ur help on this

....problem solved...

Is there any way this query can be optimized...i'm running it on a huge table with joins

- Sumeet


On 2/23/07, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote:


On 2/24/07, Sumeet <asumeet@gmail.com> wrote:
> Hi all,
>
> I'm trying to write  a query to select random values from a set of 'GROUP
> BY'
> ....see the scenario below to understand the problem here (the actual
> problem cannot be discussed here so i'm taking an example scenario)
>
> Assume there is a table
>
> id | name | year_of_birth
>
> query: I want  to select for each year_of_birth a random name.

Dear Sumeet

postgresql DISTINCT ON may be of help , but its not standard sql.

regds
mallah.

tradein_clients=> SELECT * from temp.test;
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  1 | A    |   2 |
|  2 | B    |   2 |
|  3 | C    |   2 |
|  4 | D    |   1 |
|  5 | E    |   1 |
|  6 | F    |   1 |
+----+------+-----+
(6 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.test order by yob,random();
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  5 | E    |   1 |
|  1 | A    |   2 |
+----+------+-----+
(2 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.test order by yob,random();
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  4 | D    |   1 |
|  1 | A    |   2 |
+----+------+-----+
(2 rows)





>
> --> so i do a group by year_of_birth, now i have a set of names, is there
> any function to select just one name from these set of names.
> The current approach i'm using to solve this problem is
>
> 1) getting these names in a single string using a custom function
> 'group_concat'
> 2) Convert the single string into an array
> 3) use postgresql random function to generate  a random number
> 4) us the random number to select a element from the array previously
> created.
>
> The solution is there but it's kinda hack, is there any other better way of
> solving this problem.
>
>
> Thanks,
> Sumeet



--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.

Re: selecting random row values in postgres

From
"Rajesh Kumar Mallah"
Date:


On 2/24/07, Sumeet <asumeet@gmail.com> wrote:
got it....I just figured out that i dont need the ORDER BY clause even the first row selected by the 'DISTINCT ON' would solve the problem.

Dear Sumeet,

if order by is not done there is no certainty  about which  row gets
selected. usually same row keeps getting selected. but if you want
a really random rows to come  order by is required.

if you need certainty about the particular row being selected
order by a non random() column is required.

regds
mallah.
 

Thanks for all you help
-Sumeet.


On 2/23/07, Sumeet <asumeet@gmail.com> wrote:
Thanks Buddy, really appreciate ur help on this

....problem solved...

Is there any way this query can be optimized...i'm running it on a huge table with joins

- Sumeet



On 2/23/07, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote:


On 2/24/07, Sumeet <asumeet@gmail.com> wrote:
> Hi all,
>
> I'm trying to write  a query to select random values from a set of 'GROUP
> BY'
> ....see the scenario below to understand the problem here (the actual
> problem cannot be discussed here so i'm taking an example scenario)
>
> Assume there is a table
>
> id | name | year_of_birth
>
> query: I want  to select for each year_of_birth a random name.

Dear Sumeet

postgresql DISTINCT ON may be of help , but its not standard sql.

regds
mallah.

tradein_clients=> SELECT * from temp.test;
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  1 | A    |   2 |
|  2 | B    |   2 |
|  3 | C    |   2 |
|  4 | D    |   1 |
|  5 | E    |   1 |
|  6 | F    |   1 |
+----+------+-----+
(6 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.test order by yob,random();
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  5 | E    |   1 |
|  1 | A    |   2 |
+----+------+-----+
(2 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.test order by yob,random();
+----+------+-----+
| id | name | yob |
+----+------+-----+
|  4 | D    |   1 |
|  1 | A    |   2 |
+----+------+-----+
(2 rows)





>
> --> so i do a group by year_of_birth, now i have a set of names, is there
> any function to select just one name from these set of names.
> The current approach i'm using to solve this problem is
>
> 1) getting these names in a single string using a custom function
> 'group_concat'
> 2) Convert the single string into an array
> 3) use postgresql random function to generate  a random number
> 4) us the random number to select a element from the array previously
> created.
>
> The solution is there but it's kinda hack, is there any other better way of
> solving this problem.
>
>
> Thanks,
> Sumeet



--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.



--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.

Re: selecting random row values in postgres

From
Tommy Gildseth
Date:
Sumeet wrote:
> Thanks Buddy, really appreciate ur help on this
>
> ....problem solved...
>
> Is there any way this query can be optimized...i'm running it on a 
> huge table with joins

ORDER BY rand() is rather slow on large datasets, since the db has to 
actually generate a random value for each row in the table, before being 
able use it to sort by. Preferable ways to do this include f.ex:
SELECT max(id) FROM table;
SELECT * FROM table WHERE id > [rand value between 0 and max(id)] ORDER 
BY id LIMIT 1;

This means you need to execute 2 queries, and it can also be a good idea 
to somehow cache the number of rows/largest ID of the table, for quicker 
performence.
You can find an interesting discussion on this topic at 
http://thedailywtf.com/Comments/Finding_Random_Rows.aspx (yeah, I know. 
thedayilywtf.com isn't normally what I'd use as a reference for anything 
:-) )

-- 
Tommy


Re: selecting random row values in postgres

From
Geoff Tolley
Date:
Tommy Gildseth wrote:
> Sumeet wrote:
>> Thanks Buddy, really appreciate ur help on this
>>
>> ....problem solved...
>>
>> Is there any way this query can be optimized...i'm running it on a 
>> huge table with joins
> 
> ORDER BY rand() is rather slow on large datasets, since the db has to 
> actually generate a random value for each row in the table, before being 
> able use it to sort by. Preferable ways to do this include f.ex:
> SELECT max(id) FROM table;
> SELECT * FROM table WHERE id > [rand value between 0 and max(id)] ORDER 
> BY id LIMIT 1;

I'd have thought that in most cases the slowness would be due more to the 
sort than the cost of generating the random numbers. Your pair of queries 
neatly avoid doing the sort, but it does raise the question, does it matter 
if not all rows have the same chance of being picked? As an extreme example 
in the above, if there are id's 1-100 and id = 100000 in the table, the 
last one will almost always be picked.

I wonder if the random aggregate that Josh Berkus wrote could be adapted to 
return a record type rather than a random value of a single column? Its big 
advantage is that it requires no sorting and all entries are equiprobable. 
Unfortunately I'm a bit short of time at this end of a Friday to do much 
but give you the URL:

http://www.powerpostgresql.com/Random_Aggregate

HTH,
Geoff