Thread: mal advice in FAQ 4.1.
Hello I found lot of slow queries in some databases which I checked based on advice 4.1. from FAQ, To SELECT a random row, use: SELECT col FROM tab ORDER BY random() LIMIT 1; It's robust and slow on bigger tables. Can we add some better solutions? Regards Pavel Stehule
Hubert recently posted his thoughts on this topic: http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/
I've encountered with this problem several times in web development and every time found out that the best (in terms of performance) solution is to use some pseudo random approach (such as ">= random() limit 1" or "limit 1 offset random()*N" or even pre-caching rows on app side).
--
Best regards,
Nikolay
I've encountered with this problem several times in web development and every time found out that the best (in terms of performance) solution is to use some pseudo random approach (such as ">= random() limit 1" or "limit 1 offset random()*N" or even pre-caching rows on app side).
On 10/9/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello
I found lot of slow queries in some databases which I checked based on
advice 4.1. from FAQ,
To SELECT a random row, use:
SELECT col
FROM tab
ORDER BY random()
LIMIT 1;
It's robust and slow on bigger tables. Can we add some better solutions?
--
Best regards,
Nikolay
2007/10/9, Nikolay Samokhvalov <samokhvalov@gmail.com>: > Hubert recently posted his thoughts on this topic: > http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/ > > I've encountered with this problem several times in web development and > every time found out that the best (in terms of performance) solution is to > use some pseudo random approach (such as ">= random() limit 1" or "limit 1 > offset random()*N" or even pre-caching rows on app side). > I know this article, but you cannot link from faq to private (unstable) blog. it would article on techdoc.postgresql.org Pavel On 10/9/07, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Hello > > > > I found lot of slow queries in some databases which I checked based on > > advice 4.1. from FAQ, > > > > To SELECT a random row, use: > > SELECT col > > FROM tab > > ORDER BY random() > > LIMIT 1; > > > > It's robust and slow on bigger tables. Can we add some better solutions? > > > > > -- > Best regards, > Nikolay
"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes: > Hubert recently posted his thoughts on this topic: > http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/ > > I've encountered with this problem several times in web development and > every time found out that the best (in terms of performance) solution is to > use some pseudo random approach (such as ">= random() limit 1" or "limit 1 > offset random()*N" or even pre-caching rows on app side). "ORDER BY random() LIMIT 1" should be faster in 8.3 due to the bounded-sort optimization. It should be basically the same as the two options above as far as how many comparisons are done and how much memory is used. It does have to call random() for every record whereas the solutions above only call random() once. But I think all of these are basically the same to a first degree approximation. They all have to do a scan of all the records being considered. If you want something faster you need a solution which can use an index to scan only the target record. There are ways of doing that but they require some application knowledge. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
2007/10/9, Gregory Stark <stark@enterprisedb.com>: > "Nikolay Samokhvalov" <samokhvalov@gmail.com> writes: > > > Hubert recently posted his thoughts on this topic: > > http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/ > > > > I've encountered with this problem several times in web development and > > every time found out that the best (in terms of performance) solution is to > > use some pseudo random approach (such as ">= random() limit 1" or "limit 1 > > offset random()*N" or even pre-caching rows on app side). > > "ORDER BY random() LIMIT 1" should be faster in 8.3 due to the bounded-sort > optimization. It should be basically the same as the two options above as far > as how many comparisons are done and how much memory is used. It does have to > call random() for every record whereas the solutions above only call random() > once. > > But I think all of these are basically the same to a first degree > approximation. They all have to do a scan of all the records being considered. > If you want something faster you need a solution which can use an index to > scan only the target record. There are ways of doing that but they require > some application knowledge. > It needs always seq scan :(, and take space on buffer cache. Solution based on random generated PK are much faster. I collaborate with one my customer. He shows random products from 10K products on every page of one eShop. And he cannot understand, so ORDER random() LIMIT is bad trick, because this trick is on PostgreSQL FAQ. Pavel
On Tue, Oct 09, 2007 at 06:40:23PM +0200, Pavel Stehule wrote: > It needs always seq scan :(, and take space on buffer cache. Solution > based on random generated PK are much faster. I collaborate with one > my customer. He shows random products from 10K products on every page > of one eShop. And he cannot understand, so ORDER random() LIMIT is bad > trick, because this trick is on PostgreSQL FAQ. It's the only trick that works in all situations though. There are ofcourse faster methods, but they require information about the distribution of the values, the type, PKs, indexes etc... The standard does have stuff relating to extracting samples from tables, but they're not implemented. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
2007/10/9, Martijn van Oosterhout <kleptog@svana.org>: > On Tue, Oct 09, 2007 at 06:40:23PM +0200, Pavel Stehule wrote: > > It needs always seq scan :(, and take space on buffer cache. Solution > > based on random generated PK are much faster. I collaborate with one > > my customer. He shows random products from 10K products on every page > > of one eShop. And he cannot understand, so ORDER random() LIMIT is bad > > trick, because this trick is on PostgreSQL FAQ. > > It's the only trick that works in all situations though. There are > ofcourse faster methods, but they require information about the > distribution of the values, the type, PKs, indexes etc... > > The standard does have stuff relating to extracting samples from > tables, but they're not implemented. > I agree. I don't wont to remove it from FAQ. I would to add note, so sometimes is necessary to find other trick. Regards Pavel