Tom Lane wrote:
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Martijn van Oosterhout wrote:
> >> Firstly, I was only thinking of going for the basic nodes (Index Scan, Seq
> >> Scan, Distinct). Other types have far more variables. Secondly, even if you
> >> only count, it's useful. For example, if it tells you that the planner is
> >> off by a factor of 10 more than 75% of the time, that's useful information
> >> independant of what the actual variables are.
>
> > Yes, only updating the stats if the estimate was off by a factor of 10
> > or so should cut down on the overhead.
>
> And reduce the usefulness even more ;-). As a pure stats-gathering
> exercise it might be worth doing, but not if you only log the failure
> cases. How will you know how well you are doing if you take a
> biased-by-design sample?
Sure is it required to count all cases, success and failure. But I don't
see why it is required to feed that information constantly back into the
statistics tables. As long as we don't restart, it's perfectly good in
the collector. And it must not be fed back to the backend on every
query.
Maybe ANALYZE would like to have some of that information? If memory
serves, ANALYZE does a poor job when the data isn't well distributet,
has few distinct values and the like. That causes wrong estimates then
(among other things, of course). The idea could be, to have ANALYZE take
a much closer look at tables with horrible estimates, to generate better
information for those.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #