David Orme <d.orme@imperial.ac.uk> writes:
> I know people have asked about this before but I can't find a working
> solution on the web - I've found code for specific instances of
> calculating medians but I need a general aggregate function for
> calculating medians, or more generally any given quantile.
> The kind of thing I need to do is to be able to extract the median
> value from a table of 4 million rows, aggregating across more than
> 50,000 grouping values - the sort of thing that is really easy to do
> for averaging:
> SELECT grid_id, avg(rs) FROM behr_grid GROUP BY grid_id;
You could build a custom aggregate for this. array_append()
will do fine as the transition function, so all you really need to
write is a final function that sorts the given array and then picks
out the middle (or appropriate-quantile) element.
In fact, you could cheat a bit and let the system do the sorting for
you:
SELECT grid_id, myagg(rs)
FROM (SELECT grid_id, rs FROM behr_grid ORDER BY grid_id, rs) ss
GROUP BY grid_id;
If the aggregate is only used in a context like this, it will always
see presorted input and so it can just pull out the middle element.
(Note: I think this trick only works in PG 7.4 and later.)
So, lightly tested:
regression=# create function get_middle(anyarray) returns anyelement as
regression-# 'declare n integer;
regression'# begin
regression'# n := (array_lower($1, 1) + array_upper($1, 1)) / 2;
regression'# return $1[n];
regression'# end' language plpgsql;
CREATE FUNCTION
regression=# create aggregate sortedmedian(
regression(# sfunc = array_append,
regression(# finalfunc = get_middle,
regression(# basetype = anyelement,
regression(# stype = anyarray,
regression(# initcond = '{}'
regression(# );
CREATE AGGREGATE
regression=# select hundred, min(thousand), max(thousand), sortedmedian(thousand) from
regression-# (select hundred, thousand from tenk1 order by 1,2) ss
regression-# group by hundred;
hundred | min | max | sortedmedian
---------+-----+-----+--------------
0 | 0 | 900 | 400
1 | 1 | 901 | 401
2 | 2 | 902 | 402
3 | 3 | 903 | 403
4 | 4 | 904 | 404
5 | 5 | 905 | 405
6 | 6 | 906 | 406
7 | 7 | 907 | 407
8 | 8 | 908 | 408
9 | 9 | 909 | 409
10 | 10 | 910 | 410
...
regards, tom lane