Tom Lane <tgl@sss.pgh.pa.us> writes:
> Hm. It's been a long time since college statistics, but doesn't the
> entire concept of standard deviation depend on the assumption that the
> underlying distribution is more-or-less normal (Gaussian)? Is there a
I just had a quick chat with a statistician friends of mine on that
topic, and it seems that the only way to make sense of an average is if
you know already the distribution.
In our case, what I keep experiencing with tuning queries is that we
have like 99% of them running under acceptable threshold and 1% of them
taking more and more time.
In a normal (Gaussian) distribution, there would be no query time
farther away from the average than any other, so my experience tells me
that the query time distribution is anything BUT normal (Gaussian).
> good reason to suppose that query runtime is Gaussian? (I'd bet not;
> in particular, multimodal behavior seems very likely due to things like
> plan changes.) If not, how much does that affect the usefulness of
> a standard-deviation calculation?
I don't know what multi-modal is.
What I've been gathering from my quick chat this morning is that either
you know how to characterize the distribution and then the min max and
average are useful on their own, or you need to keep track of an
histogram where all the bins are of the same size to be able to learn
what the distribution actually is.
We didn't get to the point where I could understand if storing histogram
with a constant size on log10 of the data rather than the data itself is
going to allow us to properly characterize the distribution.
The main question I want to answer here would be the percentiles one, I
want to get the query max execution timing for 95% of the executions,
then 99%, then 99.9% etc. There's no way to answer that without knowing
the distribution shape, so we need enough stats to learn what the
distribution shape is (hence, histograms).
Of course keeping enough stats seems to always begin with keeping the
min, max and average, so we can just begin there. We would just be
unable to answer interesting questions with just that.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support