Re: finding medians - Mailing list pgsql-hackers

From Tom Lane
Subject Re: finding medians
Date
Msg-id 7896.1022795153@sss.pgh.pa.us
Whole thread Raw
In response to finding medians  (Josh Burdick <jburdick@gradient.cis.upenn.edu>)
List pgsql-hackers
Josh Burdick <jburdick@gradient.cis.upenn.edu> writes:
> illustrates the limitations of the current aggregate function setup, 
> which works so nicely for avg() and stddev().
>     I don't have any good solutions.  I tried using a float4[] to store 
> each element as it's added, but I couldn't get array updates working in 
> PL/PgSQL, so that didn't help.
>     Perhaps aggregate functions could be passed an array?  Or a cursor, 
> pointing at the first line?  I'm not sure.

I don't think that would help.  The real problem here is the amount of
internal storage needed.  AFAIK there are no exact algorithms for finding
the median that require less than O(N) workspace for N input items.
Your "hack" with a temporary table is not a bad approach if you want to
work for large N.

There are algorithms out there for finding approximate medians using
limited workspace; it might be reasonable to transform one of these into
a Postgres aggregate function.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Katherine Ward
Date:
Subject: Small changes to facilitate Win32 port
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Small changes to facilitate Win32 port