Re: BUG #4916: wish: more statistical functions (median, percentiles etc) - Mailing list pgsql-bugs

From Jasen Betts
Subject Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date
Msg-id h3a6kq$ekn$1@reversiblemaps.ath.cx
Whole thread Raw
In response to BUG #4916: wish: more statistical functions (median, percentiles etc)  ("Richard Neill" <rn214@cam.ac.uk>)
List pgsql-bugs
On 2009-07-10, Richard Neill <rn214@cam.ac.uk> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      4916
> Logged by:          Richard Neill
> Email address:      rn214@cam.ac.uk
> PostgreSQL version: 8.4
> Operating system:   Linux
> Description:        wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc),  it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

percentile isn't really an agregate function as its value is mainly
determined by two (or fewer) values in the data set.

here's a function that works on arbitrarily named tables,
by building queries to get the needed statistics and then
using order by, offset, limit. Exploiting
an index (if present) to retrtieve the data values.

it won't work (well) on subqueries, but as 'col' and tbl are substitutd in
unescaped you can use joins and expressions if needed (expect a
performance hit if you do!)

-- percentile function:
-- usage example: select percentile('sometable','somecolumn','true', 0.45);
-- 'true' is a where constraint, 0.45 gets the 45th percentile.
--
-- for best preformance have an index that covers the where condition
-- and col.

CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision
as$$ 
    DECLARE
        cnt integer;  -- count of records
        rat double precision; -- ratio
        pos double precision; -- postion
        res double precision; -- result
    BEGIN
        EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt;
        pos=frc*(cnt-1);
        rat=mod(pos::numeric,1);
        EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text
            || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
            || 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )'
            || ' + (SELECT '||col||' * '||rat::text
            || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
            || 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES;
        RETURN res;
    END;
$$ LANGUAGE PLPGSQL;

pgsql-bugs by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Next
From: Joe Conway
Date:
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)