On Tue, May 10, 2016 at 10:05:13AM -0500, Kevin Grittner wrote:
> On Tue, May 10, 2016 at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Kevin Grittner <kgrittn@gmail.com> writes:
> >> There were 75 samples each of "disabled" and "reverted" in the
> >> spreadsheet. Averaging them all, I see this:
> >
> >> reverted: 290,660 TPS
> >> disabled: 292,014 TPS
> >
> >> That's a 0.46% overall increase in performance with the patch,
> >> disabled, compared to reverting it. I'm surprised that you
> >> consider that to be a "clearly measurable difference". I mean, it
> >> was measured and it is a difference, but it seems to be well within
> >> the noise. Even though it is based on 150 samples, I'm not sure we
> >> should consider it statistically significant.
> >
> > You don't have to guess about that --- compare it to the standard
> > deviation within each group.
>
> My statistics skills are rusty, but I thought that just gives you
> an effect size, not any idea of whether the effect is statistically
> significant.
I discourage focusing on the statistical significance, because the hypothesis
in question ("Applying revert.patch to 4bbc1a7e decreases 'pgbench -S -M
prepared -j N -c N' tps by 0.46%.") is already an unreliable proxy for
anything we care about. PostgreSQL performance variation due to incidental,
ephemeral binary layout motion is roughly +/-5%. Assuming perfect confidence
that 4bbc1a7e+revert.patch is 0.46% slower than 4bbc1a7e, the long-term effect
of revert.patch could be anywhere from -5% to +4%.
If one wishes to make benchmark-driven decisions about single-digit
performance changes, one must control for binary layout effects:
http://www.postgresql.org/message-id/87vbitb2zp.fsf@news-spur.riddles.org.uk
http://www.postgresql.org/message-id/20160416204452.GA1910190@tornado.leadboat.com
nm