Thread: Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)

On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
>
> The following bug has been logged online:
>
> Bug reference:      4916
> Logged by:          Richard Neill
> Email address:      rn214@cam.ac.uk
> PostgreSQL version: 8.4
> Operating system:   Linux
> Description:        wish: more statistical functions (median, percentiles etc)

This really should go to the pgsql-general list, where I'm redirecting
follow-ups. :)

> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc),  it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.

You can get ntiles in 8.4 with the windowing functions.  As to median,
it is very precisely defined...several different and incompatible
ways.  Any ideas as to how to handle this?

> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].

You can get that with windowing functions, too. :)

> According to google, this has been a wish since at least year 2000
> for various people, but doesn't seem to be implemented.

Patches are welcome :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)

From
Grzegorz Jaśkiewicz
Date:
On Sun, Jul 12, 2009 at 5:53 PM, David Fetter<david@fetter.org> wrote:
>> [mode would also be useful, as an explicit function, though we can get it
>> easily enough using count(1) order by count desc].
>
> You can get that with windowing functions, too. :)
>
>> According to google, this has been a wish since at least year 2000
>> for various people, but doesn't seem to be implemented.
>
> Patches are welcome :)

Trouble in writing such aggregate, would be that it has to keep full
set, in order to sort it, and choose n/2 element (in case set contains
odd number of elements), or ([n/2]+[(n/2)+1])/2 otherwise.

I usually, if in need to calculate it , I usually do it like that (but
that's pretty slow on large sets):

pg84@atlantic:~$ psql
psql (8.4beta2)
Type "help" for help.

pg84=# create table foo(a int not null);
CREATE TABLE
pg84=# insert into foo(a) select random()*666 from generate_series(1,666);
INSERT 0 666
pg84=# select a from foo order by a limit 1 offset (select count(*)/2 from foo);
  a
-----
 321
(1 row)


(yeah, I know it is lame).

So , I think in order to create such patch, the aggregate would have
to secretly create some temporary table, to store the set first...


--
GJ

> > Details:
> >
> > In addition to the existing aggregate functions (avg, stddev etc),
> it would
> > be nice if postgres could return further information. For example,
> the
> > quartiles, percentiles, and median.
[Spotts, Christopher]
If you're interested in doing real stat work in postgres, try PL/R.