Thread: user defined aggregate for percentile calculations

user defined aggregate for percentile calculations

From
Kashmir
Date:
cant seem to find anything about this,
would it possible at all to create a percentile-aggregate in pgres?
any pointers?
tia
-k




Re: user defined aggregate for percentile calculations

From
"A. Kretschmer"
Date:
In response to Kashmir :
>
> cant seem to find anything about this,
> would it possible at all to create a percentile-aggregate in pgres?
> any pointers?
> tia
> -k

I'm not sure if i understand your problem, but how about:

test=*# select * from percentile ;
 id | value
----+-------
  1 |    10
  2 |    20
  3 |    30
  4 |    50
(4 rows)

test=*# select a.id, a.value, (a.value*100/foo.summe)::numeric(10,2)
from percentile a, (select sum(value) as summe from percentile ) as foo
group by a.id, a.value, foo.summe order by id;
 id | value | numeric
----+-------+---------
  1 |    10 |    9.00
  2 |    20 |   18.00
  3 |    30 |   27.00
  4 |    50 |   45.00
(4 rows)


HTH, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: user defined aggregate for percentile calculations

From
Kashmir
Date:
thx for anwering!,

i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile
usually i do this in perl with the 'Statistics::Descriptive' module,
but it would help me alot if i could solve this within the sql query,
and would not have to push all the data into a perl arrays first..

so i could e.g. simply say:
select 95percentile(column with many many values) from sometable
actually what i really need is something like:
select 95p(col), 98p(col), max(col), min(col), avg(col) from thetable...
that would save me alot of cycles i'm takin in perl at the moment...
guess my sql is not smart enough, but was thinking a user defined function would be of great use :-)

any more suggestions?

tia!
-k


----- Original Message ----
From: A. Kretschmer <andreas.kretschmer@schollglas.com>
To: pgsql-general@postgresql.org
Sent: Monday, April 6, 2009 4:53:28 AM
Subject: Re: [GENERAL] user defined aggregate for percentile calculations

In response to Kashmir :
>
> cant seem to find anything about this,
> would it possible at all to create a percentile-aggregate in pgres?
> any pointers?
> tia
> -k

I'm not sure if i understand your problem, but how about:

test=*# select * from percentile ;
id | value
----+-------
  1 |    10
  2 |    20
  3 |    30
  4 |    50
(4 rows)

test=*# select a.id, a.value, (a.value*100/foo.summe)::numeric(10,2)
from percentile a, (select sum(value) as summe from percentile ) as foo
group by a.id, a.value, foo.summe order by id;
id | value | numeric
----+-------+---------
  1 |    10 |    9.00
  2 |    20 |   18.00
  3 |    30 |   27.00
  4 |    50 |   45.00
(4 rows)


HTH, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





Re: user defined aggregate for percentile calculations

From
Alvaro Herrera
Date:
Kashmir wrote:
>
> thx for anwering!,
>
> i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile
> usually i do this in perl with the 'Statistics::Descriptive' module,
> but it would help me alot if i could solve this within the sql query,
> and would not have to push all the data into a perl arrays first..

Have you considered writing a function in the PL/R language?  It has
percentiles (actually quantiles) and all sorts of useful math stuff.

See http://www.joeconway.com/plr/

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: user defined aggregate for percentile calculations

From
Greg Smith
Date:
On Sun, 5 Apr 2009, Kashmir wrote:

> would it possible at all to create a percentile-aggregate in pgres?

I normally just do this right in the database without specifically
accelerating it with an aggregate.  Not very efficient but it works fine
for reasonably sized data sets that fit into the database cache.  Here's
an example that computes some statistics about temporary data in a table
named "timing" into a summary statistics table named tests:

update tests set trans=(select count(*) from timing);
update tests set
   avg_latency=(select avg(latency) from timing),
   max_latency=(select max(latency) from timing),
   percentile_90_latency=
     (select latency from timing
      order by latency offset (round(0.90*trans)) limit 1);

Even if that's not efficient enough for your final app, you might use that
sort of thing as a prototype until you get a better implementation, rather
than dropping into Perl.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD