Re: Use of index in 7.0 vs 6.5 - Mailing list pgsql-sql
From | Jeff Hoffmann |
---|---|
Subject | Re: Use of index in 7.0 vs 6.5 |
Date | |
Msg-id | 392D4731.A8624F8C@propertykey.com Whole thread Raw |
In response to | Use of index in 7.0 vs 6.5 (Ryan Bradetich <ryan_bradetich@hp.com>) |
Responses |
Re: Use of index in 7.0 vs 6.5
|
List | pgsql-sql |
Tom Lane wrote: > One way to put a thumb on the scales is to reduce the value of the SET > variable random_page_cost. The default value is 4.0, which seems to > correspond more or less to reality, but reducing it to 3 or so would > shift the planner pretty nicely in the direction of indexscans. > like you may know, i've gone through a lot of things related to cost estimates & i've convinced myself that (on paper) they're pretty good (other than selectivity estimates, of course). this may be heavily influenced by my choice of OS & the corresponding disk cache management scheme, but it seems to me that the sequential scan cost is a lot closer to reality because cache doesn't come into play as much as compared to the indexed scans which (probably) are heavily influence by disk cache. unfortunately, you'd have to simulate some very high load conditions to negate the disk cache effect and convince people that the numbers are actually fairly correct. even though i like to think i have very important & demanding, end-all, be-all applications and that postgresql should cater to my situation, i, like almost all of the other users, operate in a very low load environments for the most part. therefore, you may want to consider placing the defaults where they may be more advantageous to the casual user (or at least to the user which has a small number of backends & is fairly dedicated to operating as a database server) anyway, it seems to me that if you're going to fudge a number, random_page_cost is probably the best place to do it because it factors so heavily in the total cost. fudging things in the selectivity can be frustrating because you can fudge for a certain table distribution, but who's to say that the same table distribution will be the same for all tables? since the selectivity doesn't have a lot of real-world relevance, what do you gain in a general case by fudging it? at least messing with random_page_cost, you know that on paper the value of 4 is reasonably close to reality and that you're factoring in disk cache, which should be fairly constant on your system (assuming a similar volume of disk accesses). on another topic, is there a list somewhere of the variables that you can adjust with a SET command? some of the variables have been invaluable in learning about cost estimates, but i can't remember seeing a list of them documented anywhere -- i've just stumbled onto them, more or less. jeff