A long time ago, I emailed this list about calculating
a geometric mean in PostgreSQL. Creating a custom
function didn't work because the process of
multiplying the values from each record resulted in
numbers that exceeded the limits for the size of a
number very quickly when dealing with large
populations.
I have learned, since, that you can achieve the same
end by replacing certain steps with log functions.
(Someone who is very good at math showed me this -- I
just tested the results and wrote the sql.) This
method has 2 great benefits:
1. The method pushes the limits of deriving geometric
mean calculations considerably.
2. The default installation of PostgreSQL has
everything needed to perform the calculation.
The sql statement below calculates the geometric mean
of the lengths of stay (gm_los) for patients, grouped
by diagnostic related group and fiscal year.
The population (cases) and average length of stay
(avg_los) are also reported.
Note 1. Make sure you are calculating geometric mean
on a data type that has values to the right of the
decimal point.
Note 2. You cannot use a log function on a value <= 0.
Thus, I filtered for los > 0.
select drg_no, fy, count(pt_id) as cases,
avg(los) as avg_los,
exp(sum(ln(los::real)) * (1.0/count(pt_id))) as gm_los
from case_current where los > 0
group by drg_no, fy;
Have fun!
Andrew Gould