Home > mailing lists

Re: Query only slow on first run - Mailing list pgsql-performance

From	Craig James
Subject	Re: Query only slow on first run
Date	November 28, 2007 10:56:45
Msg-id	474D81A1.20301@emolecules.com Whole thread Raw
In response to	Re: Query only slow on first run (tmp <skrald@amossen.dk>)
List	pgsql-performance

Tree view

tmp wrote:
>> what exactly is that
>> "random_number" column
>
> A random float that is initialized when the row is created and never
> modified afterwards. The physical row ordering will clearly not match
> the random_number ordering. However, other queries uses a row ordering
> by the primary key so I don't think it would make much sense to make the
> index on random_number a clustering index just in order to speed up this
> single query.
>
>>  and why are you desirous of ordering by it?
>
> In order to simulate a random pick of K rows. See [1].

A trick that I used is to sample the random column once, and create a much smaller table of the first N rows, where N
isthe sample size you want, and use that. 

If you need a different N samples each time, you can create a temporary table, put your random N rows into that, do an
ANALYZE,and then join to this smaller table.  The overall performance can be MUCH faster even though you're creating
andpopulating a whole table, than the plan that Postgres comes up with. This seems wrong-headed (why shouldn't Postgres
beable to be as efficient on its own?), but it works. 

Craig

pgsql-performance by date:

From: Matthew
Date: 28 November 2007, 10:41:14
Subject: Re: TB-sized databases

From: Pablo Alcaraz
Date: 28 November 2007, 10:57:25
Subject: Re: TB-sized databases

Re: Query only slow on first run - Mailing list pgsql-performance

Previous

Next