On Tue, Oct 22, 2013 at 2:56 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Hm. It's been a long time since college statistics, but doesn't the
>> entire concept of standard deviation depend on the assumption that the
>> underlying distribution is more-or-less normal (Gaussian)? Is there a
>
> I just had a quick chat with a statistician friends of mine on that
> topic, and it seems that the only way to make sense of an average is if
> you know already the distribution.
>
> In our case, what I keep experiencing with tuning queries is that we
> have like 99% of them running under acceptable threshold and 1% of them
> taking more and more time.
Agreed.
In a lot of Heroku's performance work, the Perc99 and Perc95 have
provided a lot more value that stddev, although stddev is a lot better
than nothing and probably easier to implement.
There are apparently high-quality statistical approximations of these
that are not expensive to compute and are small in memory representation.
That said, I'd take stddev over nothing for sure.
Handily for stddev, I think by snapshots of count(x), sum(x),
sum(x**2) (which I understand to be the components of stddev), I think
one can compute stddevs across different time spans using auxiliary
tools that sample this triplet on occasion. That's kind of a handy
property that I'm not sure if percN-approximates can get too easily.