On Fri, Aug 07, 2009 at 02:08:21PM -0500, Kevin Grittner wrote:
> With the 20 samples from that last round of tests, the answer (rounded
> to the nearest percent) is 60%, so "probably noise" is a good summary.
> Combined with the 12 samples from earlier comparable runs with the
> prior version of the patch, it goes to a 90% probability that noise
> would generate a difference at least that large, so I think we've
> gotten to "almost certainly noise". :-)
>
> To me, that seems more valuable for this situation than saying "we
> haven't reached 90% confidence that it's a real difference." I used
> the same calculations up through the t-statistic.
The stats people in our group just tend to say that things are
significant or not at a specific level; never bothered to find out why,
I'll ask someone when I get a chance.
> The one question I have left for this technique is why you went with
>
> ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> instead of
> ((avg1 - avg2) / (stddev / sqrt(samples)))
I was just doing a literal translation of what was on the Wikipedia
page:
http://en.wikipedia.org/wiki/Student's_t-test#Independent_two-sample_t-test
If you really want to find out, there should be much better
implementations in the pl/r language already in PG. I'd trust R much
more than Wikipedia, but for things like this Wikipedia is reasonable.
> I assume that it's because the baseline was a set of samples rather
> than a fixed mark, but I couldn't pick out a specific justification
> for this in the literature (although I might have just missed it), so
> I'd feel more comfy if you could clarify.
Sorry, that's about my limit! I've never studied stats, I'm a computer
science person who just happens to be around people who use stats on a
day-to-day basis and think it needs more use in the software world. I
think you're right and you're aggregating the errors from two (assumed
independent) datasets hence you want to keep a bit more of the error in
there. As to the formal justification (and probably proof) I've no real
idea.
> Given the convenience of capturing benchmarking data in a database,
> has anyone tackled implementation of something like the spreadsheet
> TDIST function within PostgreSQL?
Again, pl/r is what you want!
-- Sam http://samason.me.uk/