Thread: BUG #4916: wish: more statistical functions (median, percentiles etc)

BUG #4916: wish: more statistical functions (median, percentiles etc)

From
"Richard Neill"
Date:
The following bug has been logged online:

Bug reference:      4916
Logged by:          Richard Neill
Email address:      rn214@cam.ac.uk
PostgreSQL version: 8.4
Operating system:   Linux
Description:        wish: more statistical functions (median, percentiles
etc)
Details:

In addition to the existing aggregate functions (avg, stddev etc),  it would
be nice if postgres could return further information. For example, the
quartiles, percentiles, and median.

[mode would also be useful, as an explicit function, though we can get it
easily enough using count(1) order by count desc].

According to google, this has been a wish since at least year 2000 for
various people, but doesn't seem to be implemented.

Thanks - Richard

Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Joshua Tolley
Date:
On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
> In addition to the existing aggregate functions (avg, stddev etc),  it wo=
uld
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.=20=20
>=20
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>=20
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.=20

That's because no one has yet taken the time. However, patches are welcome,=
 if
you'd like it enough to implement it. It's on my list of things that might =
be
interesting to write, for example, but there are other things higher up on
that list.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Joshua Tolley
Date:
On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote:
> Thanks for your reply. Sadly, I haven't the time (or expertise) to write=
=20=20
> this myself. However, the feature would be really useful to have. I'd=20=
=20
> certainly be willing to make a =A3200 payment or donation in return.

That's very nice of you to make the offer. Pending someone taking you up on
it, you might consider your ability to write the functions in some procedur=
al
language. They would probably be easier to write, and you'd only have to ma=
ke
them handle data types you're planning to use them with. For instance, ther=
e's
an example of PL/Perl versions available embedded in the code here:

 http://tr.im/rPDA

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Richard Neill
Date:
Dear Joshua,

Thanks for your reply. Sadly, I haven't the time (or expertise) to write
this myself. However, the feature would be really useful to have. I'd
certainly be willing to make a £200 payment or donation in return.

I'm aware that this number is problematic, as it undervalues developer
time (£200 is the value, to me, of that feature; which is probably
uncorrelated with the value of his time to anyone who might implement it).

The offer is there if anyone wants to take it; please feel free to
contact me off-list.

Richard




Joshua Tolley wrote:
> On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
>> In addition to the existing aggregate functions (avg, stddev etc),  it would
>> be nice if postgres could return further information. For example, the
>> quartiles, percentiles, and median.
>>
>> [mode would also be useful, as an explicit function, though we can get it
>> easily enough using count(1) order by count desc].
>>
>> According to google, this has been a wish since at least year 2000 for
>> various people, but doesn't seem to be implemented.
>
> That's because no one has yet taken the time. However, patches are welcome, if
> you'd like it enough to implement it. It's on my list of things that might be
> interesting to write, for example, but there are other things higher up on
> that list.
>
> --
> Joshua Tolley / eggyknap
> End Point Corporation
> http://www.endpoint.com

Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Pavel Stehule
Date:
2009/7/11 Richard Neill <rn214@cam.ac.uk>:
>
> The following bug has been logged online:
>
> Bug reference: =C2=A0 =C2=A0 =C2=A04916
> Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Richard Neill
> Email address: =C2=A0 =C2=A0 =C2=A0rn214@cam.ac.uk
> PostgreSQL version: 8.4
> Operating system: =C2=A0 Linux
> Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0wish: more statistical functions =
(median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc), =C2=A0=
it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

hello

you can use following tricks:

create or replace function nth_percentil(anyarray, int)
returns anyelement as $$
  select $1[$2/100.0 * array_upper($1,1) + 1];
$$ language sql immutable strict;

pagila=3D# select nth_percentil(array(select length from film order by 1),9=
0);
 nth_percentil
---------------
           173
(1 row)

pagila=3D# select count(case when length < 173 then 1 end)::float /
count(*) * 100.0
            from film;
 ?column?
----------
     89.6
(1 row)

create or replace function median(anyarray)
returns float as $$
  select ($1[round(array_upper($1,1)/2.0)] + $1[array_upper($1,1) -
round(array_upper($1,1)/2.0) + 1]) / 2.0::float;
$$ language sql immutable strict;

pagila=3D# select median(array[1,2]), median(array[1,2,3]),
median(array[1,2,3,4]);
 median | median | median
--------+--------+--------
    1.5 |      2 |    2.5
(1 row)

pagila=3D# select median(array(select length from film order by 1));
 median
--------
    114
(1 row)

tested on pagila database

regards
Pavel Stehule
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

Re: BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Jasen Betts
Date:
On 2009-07-10, Richard Neill <rn214@cam.ac.uk> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      4916
> Logged by:          Richard Neill
> Email address:      rn214@cam.ac.uk
> PostgreSQL version: 8.4
> Operating system:   Linux
> Description:        wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc),  it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

percentile isn't really an agregate function as its value is mainly
determined by two (or fewer) values in the data set.

here's a function that works on arbitrarily named tables,
by building queries to get the needed statistics and then
using order by, offset, limit. Exploiting
an index (if present) to retrtieve the data values.

it won't work (well) on subqueries, but as 'col' and tbl are substitutd in
unescaped you can use joins and expressions if needed (expect a
performance hit if you do!)

-- percentile function:
-- usage example: select percentile('sometable','somecolumn','true', 0.45);
-- 'true' is a where constraint, 0.45 gets the 45th percentile.
--
-- for best preformance have an index that covers the where condition
-- and col.

CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision
as$$ 
    DECLARE
        cnt integer;  -- count of records
        rat double precision; -- ratio
        pos double precision; -- postion
        res double precision; -- result
    BEGIN
        EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt;
        pos=frc*(cnt-1);
        rat=mod(pos::numeric,1);
        EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text
            || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
            || 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )'
            || ' + (SELECT '||col||' * '||rat::text
            || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
            || 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES;
        RETURN res;
    END;
$$ LANGUAGE PLPGSQL;
Joshua Tolley wrote:
> On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote:
>> Thanks for your reply. Sadly, I haven't the time (or expertise) to write
>> this myself. However, the feature would be really useful to have. I'd
>> certainly be willing to make a £200 payment or donation in return.
>
> That's very nice of you to make the offer. Pending someone taking you up on
> it, you might consider your ability to write the functions in some procedural
> language. They would probably be easier to write, and you'd only have to make
> them handle data types you're planning to use them with. For instance, there's
> an example of PL/Perl versions available embedded in the code here:

This stuff is pretty trivial to do with PL/R

Joe