Re: Cluster vs. non-cluster query planning - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Cluster vs. non-cluster query planning
Date
Msg-id 20060502212942.GE97354@pervasive.com
Whole thread Raw
In response to Re: Cluster vs. non-cluster query planning  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote:
> Nolan Cafferky <Nolan.Cafferky@rbsinteractive.com> writes:
> > But, I'm guessing that random_page_cost = 1 is not a realistic value.
>
> Well, that depends.  If all your data can be expected to fit in memory
> then it is a realistic value.  (If not, you should be real careful not
> to make performance decisions on the basis of test cases that *do* fit
> in RAM...)
>
> In any case, if I recall your numbers correctly you shouldn't need to
> drop it nearly that far to get the thing to make the right choice.
> A lot of people run with random_page_cost set to 2 or so.

Also, the index scan cost estimator comments indicate that it does a
linear interpolation between the entimated cost for a perfectly
correlated table and a table with 0 correlation, but in fact the
interpolation is exponential, or it's linear based on the *square* of
the correlation, which just doesn't make much sense.

I did some investigating on this some time ago, but never got very  far
with it. http://stats.distributed.net/~decibel/summary.txt has some
info, and http://stats.distributed.net/~decibel/ has the raw data.
Graphing that data, if you only include correlations between 0.36 and
0.5, it appears that there is a linear correlation between correlation
and index scan time.

Of course this is very coarse data and it'd be great if someone did more
research in this area, preferably using pg_bench or other tools to
generate the data so that others can test this stuff as well. But even
with as rough as this data is, it seems to provide a decent indication
that it would be better to actually interpolate linearly based on
correlation, rather than correlation^2. This is a production machine so
I'd rather not go mucking about with testing such a change here.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-performance by date:

Previous
From: "Tony Wasson"
Date:
Subject: Re: postgresql transaction id monitoring with nagios
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Postgres 7.4 and vacuum_cost_delay.