Thread: BUG #4916: wish: more statistical functions (median, percentiles etc)
The following bug has been logged online: Bug reference: 4916 Logged by: Richard Neill Email address: rn214@cam.ac.uk PostgreSQL version: 8.4 Operating system: Linux Description: wish: more statistical functions (median, percentiles etc) Details: In addition to the existing aggregate functions (avg, stddev etc), it would be nice if postgres could return further information. For example, the quartiles, percentiles, and median. [mode would also be useful, as an explicit function, though we can get it easily enough using count(1) order by count desc]. According to google, this has been a wish since at least year 2000 for various people, but doesn't seem to be implemented. Thanks - Richard
On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote: > In addition to the existing aggregate functions (avg, stddev etc), it wo= uld > be nice if postgres could return further information. For example, the > quartiles, percentiles, and median.=20=20 >=20 > [mode would also be useful, as an explicit function, though we can get it > easily enough using count(1) order by count desc]. >=20 > According to google, this has been a wish since at least year 2000 for > various people, but doesn't seem to be implemented.=20 That's because no one has yet taken the time. However, patches are welcome,= if you'd like it enough to implement it. It's on my list of things that might = be interesting to write, for example, but there are other things higher up on that list. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote: > Thanks for your reply. Sadly, I haven't the time (or expertise) to write= =20=20 > this myself. However, the feature would be really useful to have. I'd=20= =20 > certainly be willing to make a =A3200 payment or donation in return. That's very nice of you to make the offer. Pending someone taking you up on it, you might consider your ability to write the functions in some procedur= al language. They would probably be easier to write, and you'd only have to ma= ke them handle data types you're planning to use them with. For instance, ther= e's an example of PL/Perl versions available embedded in the code here: http://tr.im/rPDA -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Dear Joshua, Thanks for your reply. Sadly, I haven't the time (or expertise) to write this myself. However, the feature would be really useful to have. I'd certainly be willing to make a £200 payment or donation in return. I'm aware that this number is problematic, as it undervalues developer time (£200 is the value, to me, of that feature; which is probably uncorrelated with the value of his time to anyone who might implement it). The offer is there if anyone wants to take it; please feel free to contact me off-list. Richard Joshua Tolley wrote: > On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote: >> In addition to the existing aggregate functions (avg, stddev etc), it would >> be nice if postgres could return further information. For example, the >> quartiles, percentiles, and median. >> >> [mode would also be useful, as an explicit function, though we can get it >> easily enough using count(1) order by count desc]. >> >> According to google, this has been a wish since at least year 2000 for >> various people, but doesn't seem to be implemented. > > That's because no one has yet taken the time. However, patches are welcome, if > you'd like it enough to implement it. It's on my list of things that might be > interesting to write, for example, but there are other things higher up on > that list. > > -- > Joshua Tolley / eggyknap > End Point Corporation > http://www.endpoint.com
2009/7/11 Richard Neill <rn214@cam.ac.uk>: > > The following bug has been logged online: > > Bug reference: =C2=A0 =C2=A0 =C2=A04916 > Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Richard Neill > Email address: =C2=A0 =C2=A0 =C2=A0rn214@cam.ac.uk > PostgreSQL version: 8.4 > Operating system: =C2=A0 Linux > Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0wish: more statistical functions = (median, percentiles > etc) > Details: > > In addition to the existing aggregate functions (avg, stddev etc), =C2=A0= it would > be nice if postgres could return further information. For example, the > quartiles, percentiles, and median. > > [mode would also be useful, as an explicit function, though we can get it > easily enough using count(1) order by count desc]. > > According to google, this has been a wish since at least year 2000 for > various people, but doesn't seem to be implemented. > > Thanks - Richard hello you can use following tricks: create or replace function nth_percentil(anyarray, int) returns anyelement as $$ select $1[$2/100.0 * array_upper($1,1) + 1]; $$ language sql immutable strict; pagila=3D# select nth_percentil(array(select length from film order by 1),9= 0); nth_percentil --------------- 173 (1 row) pagila=3D# select count(case when length < 173 then 1 end)::float / count(*) * 100.0 from film; ?column? ---------- 89.6 (1 row) create or replace function median(anyarray) returns float as $$ select ($1[round(array_upper($1,1)/2.0)] + $1[array_upper($1,1) - round(array_upper($1,1)/2.0) + 1]) / 2.0::float; $$ language sql immutable strict; pagila=3D# select median(array[1,2]), median(array[1,2,3]), median(array[1,2,3,4]); median | median | median --------+--------+-------- 1.5 | 2 | 2.5 (1 row) pagila=3D# select median(array(select length from film order by 1)); median -------- 114 (1 row) tested on pagila database regards Pavel Stehule > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs >
On 2009-07-10, Richard Neill <rn214@cam.ac.uk> wrote: > > The following bug has been logged online: > > Bug reference: 4916 > Logged by: Richard Neill > Email address: rn214@cam.ac.uk > PostgreSQL version: 8.4 > Operating system: Linux > Description: wish: more statistical functions (median, percentiles > etc) > Details: > > In addition to the existing aggregate functions (avg, stddev etc), it would > be nice if postgres could return further information. For example, the > quartiles, percentiles, and median. > > [mode would also be useful, as an explicit function, though we can get it > easily enough using count(1) order by count desc]. > > According to google, this has been a wish since at least year 2000 for > various people, but doesn't seem to be implemented. > > Thanks - Richard percentile isn't really an agregate function as its value is mainly determined by two (or fewer) values in the data set. here's a function that works on arbitrarily named tables, by building queries to get the needed statistics and then using order by, offset, limit. Exploiting an index (if present) to retrtieve the data values. it won't work (well) on subqueries, but as 'col' and tbl are substitutd in unescaped you can use joins and expressions if needed (expect a performance hit if you do!) -- percentile function: -- usage example: select percentile('sometable','somecolumn','true', 0.45); -- 'true' is a where constraint, 0.45 gets the 45th percentile. -- -- for best preformance have an index that covers the where condition -- and col. CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision as$$ DECLARE cnt integer; -- count of records rat double precision; -- ratio pos double precision; -- postion res double precision; -- result BEGIN EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt; pos=frc*(cnt-1); rat=mod(pos::numeric,1); EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC ' || 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )' || ' + (SELECT '||col||' * '||rat::text || ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC ' || 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES; RETURN res; END; $$ LANGUAGE PLPGSQL;
Joshua Tolley wrote: > On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote: >> Thanks for your reply. Sadly, I haven't the time (or expertise) to write >> this myself. However, the feature would be really useful to have. I'd >> certainly be willing to make a £200 payment or donation in return. > > That's very nice of you to make the offer. Pending someone taking you up on > it, you might consider your ability to write the functions in some procedural > language. They would probably be easier to write, and you'd only have to make > them handle data types you're planning to use them with. For instance, there's > an example of PL/Perl versions available embedded in the code here: This stuff is pretty trivial to do with PL/R Joe