Q:Aggregrating Weekly Production Data. How do you do it? - Mailing list pgsql-general

Just wondering how everyone is doing aggregration of production data.

Data which runs in the vicinity of a few million a week.

What are the methods which will effectively provide the
min/max/average/count/stdev of the weekly sample size based on different
products/software mix etc.

and still be able to answer correctly, what's the average of data_1 over
the pass 2 months?

I can't just take the average of an 8 averages of each week)

eg:
wk   avg data_1
w1 - 2
w2 - 2
w3 - 2
w4 - 3
w5 - 1
w6 - 2
w7 - 2
w8 - 2
average of past 2 months = ave(w1-w8) which is statistically wrong.

using sum of data_1 per week would work though. Please share your
expertise / experience.

(getting min/max/count isn't much of an issue. Stdev is the main issue I
believe)

One such instance I've read about is..

From this website : (it references using SQL Server Analysis services
but I think the concept is the same)

http://www.phptr.com/articles/printerfriendly.asp?p=337135&rl=1

1. Calculate sum of square of each sale
2. multiple the result of step 1 by the sales count
3. sum all sales
4. Square the result of step 3
5. Substract the result of step 4 from the result of step 2
6. Multiply the sales count by one less than sales count ("sales_count"
* ("sales_count" - 1))
7. Divide the result of step 5 by the result of step 6
8. Stdev will be the square root of step 7

The results are valid (verified with actual data) but I don't understand
the logic. All the Statistical books I've read marked stdev as sqrt
(sum(x - ave(x))^2 / (n - 1). The formula is very different, hence the
confusion.


pgsql-general by date:

Previous
From: Ow Mun Heng
Date:
Subject: Re: New/Custom DataType - Altering definition / seeing definition in pgAdmin3
Next
From: "Phoenix Kiula"
Date:
Subject: For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER