Re: A costing analysis tool - Mailing list pgsql-hackers

From Greg Stark
Subject Re: A costing analysis tool
Date
Msg-id 87zmpae7ae.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: A costing analysis tool  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A costing analysis tool
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Hardly --- how will you choose the best subplans if you don't calculate
> their costs?

Uhm, good point. but as you say not really a problem.

> I'm also a bit suspicious of the "it's all a linear equation" premise,
> because the fact of the matter is that the cost estimates are already
> nonlinear, and are likely to get more so rather than less so as we learn
> more.  A case in point is that the reason nestloop costing sucks so
> badly at the moment is that it fails to account for cache effects in
> repeated scans ... which is definitely a nonlinear effect.

That only means the relationship between the estimates for the outside of the
nested loop to the estimates inside the loop isn't simple. But the individual
estimates for both nodes are still just linear equations themselves. That is,
the actual cost for each node is just the result of a simple linear equation
of all the parameters estimated at that node.

I think they *have* to be linear equations. If not the units can't work out
properly. Every operation takes time, and the total amount of time spent in
the query is just the sum of all the time spent in those operations. There
just aren't very many operations that make much sense on measures of time
after all.

In fact I wonder if breaking out the individual parameters would offer a way
out of the nested loop knot. If you know how much time is the plan inside the
nested loop is estimated to spend in index lookups specifically you could
discount that but keep the other parameters at full value. 

Or something like that. It might require breaking random_page_cost into two or
three different parameters that would normally have the same cost but aren't
handled the same, like random_heap_cost, random_leaf_cost, and
random_nonleaf_cost.

-- 
greg



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: A costing analysis tool
Next
From: Tom Lane
Date:
Subject: Re: slow IN() clause for many cases