Home > mailing lists

Re: Selecting K random rows - efficiently! - Mailing list pgsql-general

From	cluster
Subject	Re: Selecting K random rows - efficiently!
Date	October 26, 2007 19:37:30
Msg-id	472240CE.5000102@amossen.dk Whole thread Raw
In response to	Re: Selecting K random rows - efficiently! ("John D. Burger" <john@mitre.org>)
List	pgsql-general

Tree view

> All you're doing is picking random =subsequences= from the same
> permutation of the original data.

You have some good points in your reply. I am very much aware of this
non-random behavior you point out for the "static random-value column"
approach but at least it is fast, which is a requirement. :-(
However, if the life time of the individual rows are short, the
behaviour is, luckily, sufficiently random for my specific purpose.

I furthermore realize that the only way to get truly random samples is
to ORDER BY random(), but this is an unacceptable slow method for large
data sets.
Even though it is not trivial at all, there ARE indeed algorithms out
there [1,2] for picking random sub sets from a result set but these are
(sadly) not implemented in postgresql.


References:
[1] http://portal.acm.org/citation.cfm?id=304206

[2] http://compstat.chonbuk.ac.kr/Sisyphus/CurrentStudy/Sampling/vldb86.pdf

pgsql-general by date:

From: Tom Lane
Date: 26 October 2007, 19:23:35
Subject: Re: change format of logging statements?

From: Alvaro Herrera
Date: 26 October 2007, 20:08:45
Subject: Re: change format of logging statements?

Re: Selecting K random rows - efficiently! - Mailing list pgsql-general

Previous

Next