Thread: user defined aggregate for percentile calculations
cant seem to find anything about this, would it possible at all to create a percentile-aggregate in pgres? any pointers? tia -k
In response to Kashmir : > > cant seem to find anything about this, > would it possible at all to create a percentile-aggregate in pgres? > any pointers? > tia > -k I'm not sure if i understand your problem, but how about: test=*# select * from percentile ; id | value ----+------- 1 | 10 2 | 20 3 | 30 4 | 50 (4 rows) test=*# select a.id, a.value, (a.value*100/foo.summe)::numeric(10,2) from percentile a, (select sum(value) as summe from percentile ) as foo group by a.id, a.value, foo.summe order by id; id | value | numeric ----+-------+--------- 1 | 10 | 9.00 2 | 20 | 18.00 3 | 30 | 27.00 4 | 50 | 45.00 (4 rows) HTH, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
thx for anwering!, i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile usually i do this in perl with the 'Statistics::Descriptive' module, but it would help me alot if i could solve this within the sql query, and would not have to push all the data into a perl arrays first.. so i could e.g. simply say: select 95percentile(column with many many values) from sometable actually what i really need is something like: select 95p(col), 98p(col), max(col), min(col), avg(col) from thetable... that would save me alot of cycles i'm takin in perl at the moment... guess my sql is not smart enough, but was thinking a user defined function would be of great use :-) any more suggestions? tia! -k ----- Original Message ---- From: A. Kretschmer <andreas.kretschmer@schollglas.com> To: pgsql-general@postgresql.org Sent: Monday, April 6, 2009 4:53:28 AM Subject: Re: [GENERAL] user defined aggregate for percentile calculations In response to Kashmir : > > cant seem to find anything about this, > would it possible at all to create a percentile-aggregate in pgres? > any pointers? > tia > -k I'm not sure if i understand your problem, but how about: test=*# select * from percentile ; id | value ----+------- 1 | 10 2 | 20 3 | 30 4 | 50 (4 rows) test=*# select a.id, a.value, (a.value*100/foo.summe)::numeric(10,2) from percentile a, (select sum(value) as summe from percentile ) as foo group by a.id, a.value, foo.summe order by id; id | value | numeric ----+-------+--------- 1 | 10 | 9.00 2 | 20 | 18.00 3 | 30 | 27.00 4 | 50 | 45.00 (4 rows) HTH, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Kashmir wrote: > > thx for anwering!, > > i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile > usually i do this in perl with the 'Statistics::Descriptive' module, > but it would help me alot if i could solve this within the sql query, > and would not have to push all the data into a perl arrays first.. Have you considered writing a function in the PL/R language? It has percentiles (actually quantiles) and all sorts of useful math stuff. See http://www.joeconway.com/plr/ -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sun, 5 Apr 2009, Kashmir wrote: > would it possible at all to create a percentile-aggregate in pgres? I normally just do this right in the database without specifically accelerating it with an aggregate. Not very efficient but it works fine for reasonably sized data sets that fit into the database cache. Here's an example that computes some statistics about temporary data in a table named "timing" into a summary statistics table named tests: update tests set trans=(select count(*) from timing); update tests set avg_latency=(select avg(latency) from timing), max_latency=(select max(latency) from timing), percentile_90_latency= (select latency from timing order by latency offset (round(0.90*trans)) limit 1); Even if that's not efficient enough for your final app, you might use that sort of thing as a prototype until you get a better implementation, rather than dropping into Perl. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD