Re: Use of index in 7.0 vs 6.5 - Mailing list pgsql-sql

From Jeff Hoffmann
Subject Re: Use of index in 7.0 vs 6.5
Date
Msg-id 392D4731.A8624F8C@propertykey.com
Whole thread Raw
In response to Use of index in 7.0 vs 6.5  (Ryan Bradetich <ryan_bradetich@hp.com>)
Responses Re: Use of index in 7.0 vs 6.5
List pgsql-sql
Tom Lane wrote:
> One way to put a thumb on the scales is to reduce the value of the SET
> variable random_page_cost.  The default value is 4.0, which seems to
> correspond more or less to reality, but reducing it to 3 or so would
> shift the planner pretty nicely in the direction of indexscans.
> 

like you may know, i've gone through a lot of things related to cost
estimates & i've convinced myself that (on paper) they're pretty good
(other than selectivity estimates, of course).  this may be heavily
influenced by my choice of OS & the corresponding disk cache management
scheme, but it seems to me that the sequential scan cost is a lot closer
to reality because cache doesn't come into play as much as compared to
the indexed scans which (probably) are heavily influence by disk
cache.   unfortunately,  you'd have to simulate some very high load
conditions to negate the disk cache effect and convince people that the
numbers are actually fairly correct.  even though i like to think i have
very important & demanding, end-all, be-all applications and that
postgresql should cater to my situation, i, like almost all of the other
users, operate in a very low load environments for the most part. 
therefore, you may want to consider placing the defaults where they may
be more advantageous to the casual user (or at least to the user which
has a small number of backends & is fairly dedicated to operating as a
database server)

anyway, it seems to me that if you're going to fudge a number,
random_page_cost is probably the best place to do it because it factors
so heavily in the total cost.  fudging things in the selectivity can be
frustrating because you can fudge for a certain table distribution, but
who's to say that the same table distribution will be the same for all
tables?  since the selectivity doesn't have a lot of real-world
relevance, what do you gain in a general case by fudging it?  at least
messing with random_page_cost, you know that on paper the value of 4 is
reasonably close to reality and that you're factoring in disk cache,
which should be fairly constant on your system (assuming a similar
volume of disk accesses).

on another topic, is there a list somewhere of the variables that you
can adjust with a SET command?  some of the variables have been
invaluable in learning about cost estimates, but i can't remember seeing
a list of them documented anywhere -- i've just stumbled onto them, more
or less.

jeff


pgsql-sql by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Use of index in 7.0 vs 6.5
Next
From: Tom Lane
Date:
Subject: Re: Use of index in 7.0 vs 6.5