All,
* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:
> In our case, what I keep experiencing with tuning queries is that we
> have like 99% of them running under acceptable threshold and 1% of them
> taking more and more time.
This is usually described (at least where I come from) as 'rare events',
which goes to Tom's point that averages, stddev, etc, are not ideal
(though they are still better than nothing).
> > good reason to suppose that query runtime is Gaussian? (I'd bet not;
> > in particular, multimodal behavior seems very likely due to things like
> > plan changes.) If not, how much does that affect the usefulness of
> > a standard-deviation calculation?
Oscillating plan changes may fit multimodal but I don't feel that's
typical. My experience has been it's either an extremely rare plan
difference or it's a shift from one plan to another over time.
> What I've been gathering from my quick chat this morning is that either
> you know how to characterize the distribution and then the min max and
> average are useful on their own, or you need to keep track of an
> histogram where all the bins are of the same size to be able to learn
> what the distribution actually is.
A histogram would certainly be useful. We may also wish to look into
outlier/rare event detection methods and increase the logging we do in
those cases (if possible).
> Of course keeping enough stats seems to always begin with keeping the
> min, max and average, so we can just begin there. We would just be
> unable to answer interesting questions with just that.
It would probably be good to do some research into techniques for
outlier detection which minimizes CPU and storage cost.
Thanks,
Stephen