Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: More thoughts about planner's cost estimates
Date
Msg-id 447F23A9.1090503@agliodbs.com
Whole thread Raw
In response to More thoughts about planner's cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: More thoughts about planner's cost estimates
Re: More thoughts about planner's cost estimates
Re: More thoughts about planner's cost estimates
List pgsql-hackers
Tom,

As you know, this is something I think about a bit too, though not 
nearly as deeply as you.

> In general it seems to me that for CPU-bound databases, the default values
> of the cpu_xxx_cost variables are too low.  I am tempted to raise the
> default value of cpu_index_tuple_cost to 0.005, which would be half of
> cpu_tuple_cost and twice cpu_operator_cost; that seems more in line with
> my feel for the relative costs of things.  For a CPU-bound database all
> three would need to go up more than that.  But rather than telling people
> to manipulate all three of these variables individually, I think it might
> also be a good idea to provide a new GUC variable named something like
> "cpu_speed_scale" that would just be a multiplier for the other variables.

Yes, please!  I've done a bit of playing around with the cpu_* 
variables, and frankly have always manipulated them by the same multiple.

Also, my testing has shown that on *large* databases (20x RAM +) with 
fast processors (Opteron) the cpu_* values are too high.  But a lot of 
that is offsetting estimates which are too high elsewhere ... as is 
random_page_cost.


> I've desisted from touching this mainly because the obvious sanity
> adjustments, such as adding something for tree descent and charging
> random_page_cost not 1.0 for leaf page touches, would increase the
> estimated cost of index usage, which seemed like not the direction we
> wanted to go in.  So I figured that the errors were more or less
> cancelling each other.  But the addition of bitmap index scans is now
> exposing the weaknesses, so we need to face up to the problem.

Yeah.  I've refrained from proposing changes because it's a 
pick-up-sticks.  If we start modifying the model, we need to fix 
*everything*, not just one item.  And then educate our users that they 
need to use the GUC variables in a different way.  Here's the issues I see:

1. n-distinct estimation is bad, as previously discussed;

2. our current heuristics sampling methods prevent us from sampling more 
than 0.5% of any reasonably large table, causing all statistics on those 
tables to be off for any table with irregular distribution of values;

3. We don't have any method to analyze inter-column correlation within a 
table;

4. We don't keep statistics on foriegn key correlation;

5. random_page_cost (as previously discussed) is actually a funciton of 
relatively immutable hardware statistics, and as such should not need to 
exist as a GUC once the cost model is fixed.

6. We haven't added any way to estimate rows returned from SRFs.

> I am inclined to think that a reasonable model is to continue to estimate
> the number of index leaf pages touched as above (pro-rating against the
> total number of index entries), to charge random_page_cost per leaf page
> touched, and not to count anything for metapage or tree descent.  I
> justify the latter on the grounds that upper tree levels will tend to stay
> in cache because they're touched so often.  random_page_cost per leaf page
> may be an overestimate, since sometimes logically adjacent leaf pages will
> be physically adjacent too, but not charging for tree descent should help
> to cancel that out.  With this model, the disk cost to fetch a single
> index entry will be estimated as random_page_cost (default 4.0) rather
> than the current fixed 2.0.  This shouldn't hurt things too much for
> simple indexscans --- especially since anyone running with a reduced
> random_page_cost won't see as much change.  And it will increase the costs
> for bitmap scans that cross many index pages, which is what we need in
> light of Philippe's numbers.

Per above, I think anything involving random_page_cost is the wrong way 
to go.  RPC is a GUC assigned by the DBA on pure guesswork.  We should 
be decreasing its use or eliminating it entirely, not increasing the 
amount we use it.

For btrees, we should be able to accurately estimate the cost of the 
index access given:
a) the depth of the tree;
b) the likelyhood that requested values are adjacent;
c) whether the index and tables are already in shared_mem, or else (d) 
and (e) below:
d) the probability that the index pages are cached in memory, which is 
composed of:(1) the frequency with which that index is accessed, modified by(2) whether the index is a fraction of
availableram, or larger than ram
 
e) the probability that the relevant table pages are cached in memory, 
determined by the same two factors.

*None* of this should involve a userset parameter (random_page_cost) 
since as you pointed out months ago, the ration of seq/seek access has 
remained relatively constant over the past 6 years of HDD and I/O 
engineering.

Sadly, I don't have the math for all of the above but I'm hoping someone 
(either here or on ACM) does.

Also, this would require us to have different *formulas* and not merely 
different cost factors for different kinds of indexes.  I believe that 
this is something that Jie is already struggling with.  Jie?

> The big difficulty in modeling cache effects from multiple scans is that
> we usually don't know how many times the index will be scanned.  If we did
> have that number, I think it'd be reasonable to use the Mackert and Lohman
> approximation already used in cost_index to estimate the total number of
> index leaf pages fetched over N scans, and then divide by N to get our
> per-scan estimated cost.  But because the planner builds cost estimates
> bottom-up, in most scenarios we don't have any idea whether an indexscan
> plan node will be iterated once or many times in the finished plan.

I think we can work around this by figuring out whether the index has 
already been scanned in the plan, something we *can* know.  So the first 
scan will be full cost and remaining scans will be fractional cost. 
Possibly not as good as Mackert/Lohman, but it allows us reasonably 
accurate *overall* estimates without needing to move away from bottom-up 
plan building.


> Thoughts, gripes, better ideas?

yes.  I think if we're going to fix the cost model, we should overhaul 
it and target 8.3.   Realistically, once we start tinkering with the 
cost model most queries are going to get worse before they get better.

--Josh


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CTID issues and a soc student in need of help
Next
From: Tzahi Fadida
Date:
Subject: Re: CTID issues and a soc student in need of help