Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Review: Revise parallel pg_restore's scheduling heuristic
Date
Msg-id 4A7C3555020000250002967D@gw.wicourts.gov
Whole thread Raw
In response to Re: Review: Revise parallel pg_restore's scheduling heuristic  (Sam Mason <sam@samason.me.uk>)
Responses Re: Review: Revise parallel pg_restore's scheduling heuristic  (Robert Haas <robertmhaas@gmail.com>)
Re: Review: Revise parallel pg_restore's scheduling heuristic  (Sam Mason <sam@samason.me.uk>)
List pgsql-hackers
Sam Mason <sam@samason.me.uk> wrote: 
> All we're saying is that we're less than 90% confident that there's
> something "significant" going on.  All the fiddling with standard
> deviations and sample sizes is just easiest way (that I know of)
> that statistics currently gives us of determining this more formally
> than a hand-wavy "it looks OK to me".  Science tells us that humans
> are liable to say things are OK when they're not, as well as vice
> versa; statistics gives us a way to work past these limitations in
> some common and useful situations.
Following up, I took the advice offered in the referenced article, and
used a spreadsheet with a TDIST function for more accurate results
than available through the table included in the article.  That allows
what I think is a more meaningful number: the probability that taking
a sample that big would have resulted in a t-statistic larger than was
actually achieved if there was no real difference.
With the 20 samples from that last round of tests, the answer (rounded
to the nearest percent) is 60%, so "probably noise" is a good summary.
Combined with the 12 samples from earlier comparable runs with the
prior version of the patch, it goes to a 90% probability that noise
would generate a difference at least that large, so I think we've
gotten to "almost certainly noise".  :-)
To me, that seems more valuable for this situation than saying "we
haven't reached 90% confidence that it's a real difference."  I used
the same calculations up through the t-statistic.
The one question I have left for this technique is why you went with
((avg1 - avg2) / (stddev * sqrt(2/samples)))
instead of
((avg1 - avg2) / (stddev / sqrt(samples)))
I assume that it's because the baseline was a set of samples rather
than a fixed mark, but I couldn't pick out a specific justification
for this in the literature (although I might have just missed it), so
I'd feel more comfy if you could clarify.
Given the convenience of capturing benchmarking data in a database,
has anyone tackled implementation of something like the spreadsheet
TDIST function within PostgreSQL?
-Kevin


pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: Fixing geometic calculation
Next
From: Robert Haas
Date:
Subject: Re: Review: Revise parallel pg_restore's scheduling heuristic