Home > mailing lists

Re: Selecting K random rows - efficiently! - Mailing list pgsql-general

From	cluster
Subject	Re: Selecting K random rows - efficiently!
Date	October 24, 2007 11:06:56
Msg-id	ffnid8$1q2t$1@news.hub.org Whole thread Raw
In response to	Re: Selecting K random rows - efficiently! (Martijn van Oosterhout <kleptog@svana.org>)
List	pgsql-general

Tree view

> How important is true randomness?

The goal is an even distribution but currently I have not seen any way
to produce any kind of random sampling efficiently. Notice the word
"efficiently". The naive way of taking a random sample of size K:
    (SELECT * FROM mydata ORDER BY random() LIMIT <K>)
is clearly not an option for performance reasons. It shouldn't be
necessary to explain why. :-)

> Search the archives, there have been solutions proposed before, though
> they probably arn't very quick...

As the subject suggests, performance really matters and searching the
archives only results in poor solutions (my first post explains why).

pgsql-general by date:

From: rihad
Date: 24 October 2007, 10:59:42
Subject: 8.3b1 in production?

From: Tom Lane
Date: 24 October 2007, 11:22:19
Subject: Re: 8.3b1 in production?

Re: Selecting K random rows - efficiently! - Mailing list pgsql-general

Previous

Next