Just wondering how everyone is doing aggregration of production data.
Data which runs in the vicinity of a few million a week.
What are the methods which will effectively provide the
min/max/average/count/stdev of the weekly sample size based on different
products/software mix etc.
and still be able to answer correctly, what's the average of data_1 over
the pass 2 months?
I can't just take the average of an 8 averages of each week)
eg:
wk avg data_1
w1 - 2
w2 - 2
w3 - 2
w4 - 3
w5 - 1
w6 - 2
w7 - 2
w8 - 2
average of past 2 months = ave(w1-w8) which is statistically wrong.
using sum of data_1 per week would work though. Please share your
expertise / experience.
(getting min/max/count isn't much of an issue. Stdev is the main issue I
believe)
One such instance I've read about is..
From this website : (it references using SQL Server Analysis services
but I think the concept is the same)
http://www.phptr.com/articles/printerfriendly.asp?p=337135&rl=1
1. Calculate sum of square of each sale
2. multiple the result of step 1 by the sales count
3. sum all sales
4. Square the result of step 3
5. Substract the result of step 4 from the result of step 2
6. Multiply the sales count by one less than sales count ("sales_count"
* ("sales_count" - 1))
7. Divide the result of step 5 by the result of step 6
8. Stdev will be the square root of step 7
The results are valid (verified with actual data) but I don't understand
the logic. All the Statistical books I've read marked stdev as sqrt
(sum(x - ave(x))^2 / (n - 1). The formula is very different, hence the
confusion.