Home > mailing lists

Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: Review: Revise parallel pg_restore's scheduling heuristic
Date	August 7, 2009 12:19:41
Msg-id	4A7BFFA8020000250002965A@gw.wicourts.gov Whole thread Raw
In response to	Re: Review: Revise parallel pg_restore's scheduling heuristic (Sam Mason <sam@samason.me.uk>)
Responses	Re: Review: Revise parallel pg_restore's scheduling heuristic
List	pgsql-hackers

Tree view

Sam Mason <sam@samason.me.uk> wrote: 
> What do people do when testing this?  I think I'd look to something
> like Student's t-test to check for statistical significance.  My
> working would go something like:
> 
>   I assume the variance is the same because it's being tested on the
>   same machine.
> 
>   samples = 20
>   stddev  = 144.26
>   avg1    = 4783.13
>   avg2    = 4758.46
>   t       = 0.54  ((avg1 - avg2) / (stddev * sqrt(2/samples)))
> 
> We then have to choose how certain we want to be that they're
> actually different, 90% is a reasonably easy level to hit (i.e. one
> part in ten, with 95% being more commonly quoted).  For 20 samples
> we have 19 degrees of freedom--giving us a cut-off[1] of 1.328. 
> 0.54 is obviously well below this allowing us to say that there's no
> "statistical significance" between the two samples at a 90% level.
Thanks for the link; that looks useful.  To confirm that I understand
what this has established (or get a bit of help putting in in
perspective), what this says to me, in the least technical jargon I
can muster, is "With this many samples and this degree of standard
deviation, the average difference is not large enough to have a 90%
confidence level that the difference is significant."  In fact,
looking at the chart, it isn't enough to reach a 75% confidence level
that the difference is significant.  Significance here would seem to
mean that at least the given percentage of the time, picking this many
samples from an infinite set with an average difference that really
was this big or bigger would generate a value for t this big or
bigger.
Am I close?
I like to be clear, because it's easy to get confused and take the
above to mean that there's a 90% confidence that there is no actual
significant difference in performance based on that sampling.  (Given
Tom's assurance that this version of the patch should have similar
performance to the last, and the samples from the prior patch went the
other direction, I'm convinced there is not a significant difference,
but if I'm going to use the referenced calculations, I want to be
clear how to interpret the results.)
-Kevin

pgsql-hackers by date:

From: Sam Mason
Date: 07 August 2009, 12:17:23
Subject: Re: Fixing geometic calculation

From: Kenneth Marshall
Date: 07 August 2009, 12:29:41
Subject: Re: Fixing geometic calculation

Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

Previous

Next