Greg Stark wrote:
> Richard Huxton <dev@archonet.com> writes:
>
>>You'd only need to log them if they diverged from expected anyway. That should
>>result in fairly low activity pretty quickly (or we're wasting our time).
>>Should they go to the stats collector rather than logs?
>
> I think you need to log them all. Otherwise when you go to analyze the numbers
> and come up with ideal values you're going to be basing your optimization on a
> skewed subset.
I can see your thinking, I must admit I was thinking of a more iterative
process: estimate deltas, change config, check, repeat. I'm not
convinced there are "ideal" values with a changing workload - for
example, random_page_cost will presumably vary depending on how much
contention there is for random seeks. Certainly, effective_cache size
could vary.
> I don't know whether the stats collector or the logs is better suited to this.
>
>>>(Also, currently explain analyze has overhead that makes this impractical.
>>>Ideally it could subtract out its overhead so the solutions would be accurate
>>>enough to be useful)
>>
>>Don't we only need the top-level figures though? There's no need to record
>>timings for each stage, just work completed.
>
> I guess you only need top level values. But you also might want some flag if
> the row counts for any node were far off. In that case perhaps you would want
> to discard the data point.
I think you'd need to adjust work-estimates by actual-rows / estimated-rows.
I _was_ trying to think of a clever way of using row mis-estimates to
correct statistics automatically. This was triggered by the discussion a
few weeks ago about hints to the planner and the recent talk about plan
cacheing. Some sort of feedback loop so the planner would know its
estimates were off should be a big win from an ease-of-use point of
view. Didn't look easy to do though :-(
--
Richard Huxton
Archonet Ltd