Thread: complex custom aggregate function
Hi all, I have a table like: value int, quarter timestamp I need an aggregate function that gives back the maximum "value" using this algorithm: AVG of the first hour (first 4 quarters) (AVG0) same as above, but 1 quarter later (AVG1) .... same as above, but n quarters later (AVGn) result: the quarter where AVGn was MAX. Example: quarter value AVGn 2008-01-01 00:00 10 2008-01-01 00:15 15 2008-01-01 00:30 5 2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4) 2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4) 2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4)) the result should be ('2008-01-01 00:15', 21) It would be very easy if the input to the custom aggregate function was ordered (because I would keep 4 internal counters), but I guess there's no way of "forcing" the ordering of the input to the function, right? So I have to cache all the (quarter,value) couples and give back a result at the end, right?
Scara Maccai <m_lists@yahoo.it> writes: > It would be very easy if the input to the custom aggregate function was > ordered (because I would keep 4 internal counters), but I guess there's > no way of "forcing" the ordering of the input to the function, right? You can with a subquery. Something like SELECT agg(foo) from (SELECT foo ORDER BY bar) However that will produce one record per grouping. From what I read of your description you want to produce one record per input record. There isn't any efficient way to do that in current Postgres releases -- you would have to have a subquery which executed for every record and retrieved the set of data to aggregate. 8.4 Will have OLAP Window functions which can implement things like moving averages. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Gregory Stark wrote: > From what I read of your > description you want to produce one record per input record. Exactly. > 8.4 Will have OLAP Window functions which can implement things like > moving averages. Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving backthe result? Thank you.
Hello,
not very smart, but something like following should do the job:
h_m v
1.00 x
1.25 x
1.50 x
1.75 x
2.00 x
2.25 x
2.50 x
2.75 x
3.00 x
3.25 x
3.50 x
3.75 x
4.00 x
4.25 x
4.50 x
4.75 x
...
select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)
or use ORDER BY A desc LIMIT 1 if a single result is sufficient...
HTH,
Marc Mamin
-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Scara Maccai
Sent: Fri 1/30/2009 1:45 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex custom aggregate function
Gregory Stark wrote:
> From what I read of your
> description you want to produce one record per input record.
Exactly.
> 8.4 Will have OLAP Window functions which can implement things like
> moving averages.
Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?
Thank you.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Oops,
I meant
group by floor(h_q-x)
instead of
group by (h_q-x)/1
Marc Mamin
> select H,A
> FROM
> (
> select min(h_q) as H, avg(x) as A
> group by h_q/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.25)/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.50)/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.75)/1
> )foo
> where A= select max(A) from (foo..)
>-----Messaggio originale----- >Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai >Inviato: venerdì 30 gennaio 2009 9.36 >A: pgsql-general@postgresql.org >Oggetto: [GENERAL] complex custom aggregate function > >Hi all, > >I have a table like: > >value int, >quarter timestamp > >I need an aggregate function that gives back the maximum "value" using >this algorithm: > >AVG of the first hour (first 4 quarters) (AVG0) >same as above, but 1 quarter later (AVG1) >.... >same as above, but n quarters later (AVGn) > >result: the quarter where AVGn was MAX. > >Example: > >quarter value AVGn > >2008-01-01 00:00 10 >2008-01-01 00:15 15 >2008-01-01 00:30 5 >2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4) >2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4) >2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4)) > >the result should be ('2008-01-01 00:15', 21) > > > >It would be very easy if the input to the custom aggregate function was >ordered (because I would keep 4 internal counters), but I guess there's >no way of "forcing" the ordering of the input to the function, right? > >So I have to cache all the (quarter,value) couples and give back a >result at the end, right? > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general For that purpose, a sliding mean calculation I use the following -- -- Sample table definition -- CREATE TABLE tbl_ayas ( fulldate timestamp without time zone NOT NULL, id_1 real, -- temperature id_2 real, -- pressure .......... CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate) ) WITH (OIDS=FALSE); -- -- Function -- CREATE OR REPLACE FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar) RETURNS real AS $BODY$ #BEGIN { strict->import(); } # get values my ($myid, $myval, $mycount, $myvalid, $myslidesum, $myreset) = @_; # reset the arry if requested if ( $myreset eq 't' ) { @stored_sl_val=(); @stored_arr=(); return 0; } # restore the array of array @temp_sl_val = $stored_arr[$myid]; @stored_sl_val = @{$temp_sl_val[0]}; # check if the value is null if ( ! defined $myval ) { # log log log log log log elog(NOTICE, "perl_sliding_mean => push null value [undef]" ); # sum does not change push(@stored_sl_val, undef); } else { # log log log log log log elog(NOTICE, "perl_sliding_mean => push value $myval" ); # assign the new value push(@stored_sl_val, $myval); } # log log log log log log elog(NOTICE, "perl_sliding_mean => scalar array " . scalar @stored_sl_val ); if ( ( scalar @stored_sl_val ) > $mycount ) { # log log log log log log elog(NOTICE, "perl_sliding_mean => pop element" ); # Remove one element from the beginning of the array. shift(@stored_sl_val); } # getting mean # log log log log log log elog(NOTICE, "perl_sliding_mean => getting mean" ); my $good_values; my $result; foreach (@stored_sl_val) { # log log log log log log elog(NOTICE, "arr : " . $_ ); if ( defined $_ ) { $result += $_; $good_values ++; } } # log log log log log log elog(NOTICE, "perl_sliding_mean => sum : $result, good values : $good_values" ); my $mean; if ( $good_values >= $myvalid ) { # reset the arry if requested if ( $myslidesum eq 't' ) { $mean = $result; # sum } else { $mean = $result / $good_values; # average } } else { # log log log log log log elog(NOTICE, "perl_sliding_mean => good_values < myvalid" ); $mean = -99999999; # skip later and return null } # save back the array of array elog(NOTICE, "perl_sliding_mean => scalar stored_sl_val " . scalar @stored_sl_val ); $stored_arr[$myid] = [ @stored_sl_val ]; # return calculated sliding mean or null if ( $mean == -99999999 ) { return; } return $mean; $BODY$ LANGUAGE 'plperlu' VOLATILE; COMMENT ON FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar) IS 'Calculate sliding means/sums'; -- -- query -- Select perl_sliding_mean(0,0,0,0,'f','t'); SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_temperature", perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_pressure" ..... Regards, Paolo Saudin
> Paolo Saudin wrote: > For that purpose, a sliding mean calculation I use the following > > CREATE TABLE tbl_ayas > ( > fulldate timestamp without time zone NOT NULL, > id_1 real, -- temperature > id_2 real, -- pressure > .......... > CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate) > ) WITH (OIDS=FALSE); > > [...] > Select perl_sliding_mean(0,0,0,0,'f','t'); > SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS > "ayas_temperature", > perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS > "ayas_pressure" I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"? Thank you.
>-----Messaggio originale----- >Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai >Inviato: lunedì 2 febbraio 2009 10.36 >A: Paolo Saudin; pgsql-general@postgresql.org >Cc: pgsql-general >Oggetto: Re: R: [GENERAL] complex custom aggregate function > Paolo Saudin wrote: > For that purpose, a sliding mean calculation I use the following > > CREATE TABLE tbl_ayas > ( > fulldate timestamp without time zone NOT NULL, > id_1 real, -- temperature > id_2 real, -- pressure > .......... > CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate) > ) WITH (OIDS=FALSE); > > [...] > Select perl_sliding_mean(0,0,0,0,'f','t'); > SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS > "ayas_temperature", > perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS > "ayas_pressure" >I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"? >Thank you. I use a master table with a "fulldate" field and filled with sequential dates to fill gaps when meteo data is missing. CREATE TABLE master ( fulldate timestamp without time zone NOT NULL, CONSTRAINT master_pkey PRIMARY KEY (fulldate) ) WITH (OIDS=FALSE); So the query will be: SELECT fulldate, id_3 AS "ayas_temperature" , round(cast(perl_sliding_mean(0,id_3, 8, 6, 'f', 'f') AS numeric), 3) AS "ayas_temperature_sliding" FROM _master LEFT JOIN tables_ar.tbl_ayas USING(fulldate) WHERE fulldate > '2009-01-01' ORDER BY fulldate limit 16; 01/01/2009 1.00 -7 01/01/2009 2.00 -7,1 01/01/2009 3.00 -5,3 01/01/2009 4.00 -5,2 01/01/2009 5.00 -4,8 01/01/2009 6.00 -4 01/01/2009 7.00 -4,3 01/01/2009 8.00 -5,2 -5,363 ( mean from 01/01/2009 1.00 - 01/01/2009 8.00 ) 01/01/2009 9.00 -5,4 -5,163 ............................................... 01/01/2009 10.00 -3 -4,65 ............................................... 01/01/2009 11.00 -0,4 -4,038 ............................................... 01/01/2009 12.00 0,4 -3,338 ............................................... 01/01/2009 13.00 -0,2 -2,763 ............................................... 01/01/2009 14.00 -1,8 -2,488 ............................................... 01/01/2009 15.00 -2,2 -2,225 ............................................... 01/01/2009 16.00 -2,6 -1,9 ( mean from 01/01/2009 9.00 - 01/01/2009 16.00 ) And all the sliding means are correct ( from the 8th value ahead) Paolo Saudin
Paolo Saudin wrote: > I use a master table with a "fulldate" field and filled with sequential dates to > fill gaps when meteo data is missing. I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data?I don't mean only in case of holes. The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order ofthe data passed to the function... Suppose you have data like: 01/01/2009 1.00 -7 01/01/2009 2.00 -7,1 01/01/2009 3.00 -5,3 01/01/2009 4.00 -5,2 01/01/2009 5.00 -4,8 01/01/2009 6.00 -4 What does prevent postgresql to call your function with data in this order: 01/01/2009 6.00 -4 01/01/2009 1.00 -7 01/01/2009 5.00 -4,8 01/01/2009 3.00 -5,3 01/01/2009 2.00 -7,1 01/01/2009 4.00 -5,2 and reorder ("order by fulltime") at the very end (when results from the function are already out)? Thank you for your patience.
On Mon, Feb 2, 2009 at 2:30 PM, Scara Maccai <m_lists@yahoo.it> wrote: > Paolo Saudin wrote: >> I use a master table with a "fulldate" field and filled with sequential dates to >> fill gaps when meteo data is missing. > > I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered timestamps-data?I don't mean only in case of holes. > > The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order ofthe data passed to the function... You need to make a subquery with the ORDER BY on it. Postgres won't re-order an ORDER BY in a subquery to happen outside the outer query. So something like select perl_function(foo) from (select foo from table order by bar) -- greg
I think I've found a solution myself to the moving average problem, so I'm posting it here in case it works for some others(and so that everybody can check that I'm doing it right and in the "best" way...) Basically I'm 1) saving all the couples (timestamp, double) of an aggregation into an array using array_accum 2) unnesting the same array to get a table of (timestamp, double) tuples 3) calculating the moving average using the "self join" trick (found in "Transact SQL") of the table got from 2) 4) getting the max out of the average values. Before the code, some questions: 1) I tried doing it using python, but the array was passed as a string (I didn't find it in the docs though...) 2) I didn't understand why the "array_enum" function is present only for int arrays in the contrib "intagg" module; couldn'tit be a function with "anyarray" as argument? Such as: CREATE OR REPLACE FUNCTION array_enum(anyarray) RETURNS setof anyelement AS 'array_unnest' LANGUAGE INTERNAL IMMUTABLE STRICT; 3) Am I doing something wrong in the code below? Could I do it better? CREATE AGGREGATE array_accum (anyelement) ( sfunc = array_append, stype = anyarray, initcond = '{}' ); CREATE TYPE timemax_t AS ( t timestamp, v double precision ); CREATE OR REPLACE FUNCTION timemax_array_enum(timemax_t[]) RETURNS setof timemax_t AS 'array_unnest' LANGUAGE INTERNAL IMMUTABLE STRICT; CREATE or replace FUNCTION movingavgmax(a timemax_t[], nquarters int4) RETURNS double precision AS $$ DECLARE timemax_t_row timemax_t; retval double precision; BEGIN select max(mx) into retval from ( select x..t, AVG(y.v) as mx from timemax_array_enum(a) as x, timemax_array_enum(a) as y where x.t between y.t and y.t+((nquarters-1)*15||' minutes')::interval group by x.t having count(y.v)=nquarters ) as subs; return retval; END; $$ LANGUAGE plpgsql; -- example usage: select movingavgmax(array_accum((quarter,value)::timemax_t), 3) from test where id = 10 AND quarter between '2008-12-01 00:00:00'and '2008-12-01 10:00:00' ; ----- Messaggio originale ----- > Da: Scara Maccai <m_lists@yahoo.it> > A: pgsql-general@postgresql.org > Inviato: Venerdì 30 gennaio 2009, 9:35:53 > Oggetto: complex custom aggregate function > > Hi all, > > I have a table like: > > value int, > quarter timestamp > > I need an aggregate function that gives back the maximum "value" using > this algorithm: > > AVG of the first hour (first 4 quarters) (AVG0) > same as above, but 1 quarter later (AVG1) > ... > same as above, but n quarters later (AVGn) > > result: the quarter where AVGn was MAX. > > Example: > > quarter value AVGn > > 2008-01-01 00:00 10 > 2008-01-01 00:15 15 > 2008-01-01 00:30 5 > 2008-01-01 00:45 20 -> 12.5 ((10+15+5+20)/4) > 2008-01-01 01:15 2 -> 21 ((15+5+20+2)/4) > 2008-01-01 01:30 30 -> 14.25 ((5+20+2+30)/4)) > > the result should be ('2008-01-01 00:15', 21) > > > > It would be very easy if the input to the custom aggregate function was > ordered (because I would keep 4 internal counters), but I guess there's > no way of "forcing" the ordering of the input to the function, right? > > So I have to cache all the (quarter,value) couples and give back a > result at the end, right? Passa a Yahoo! Mail. La webmail che ti offre GRATIS spazio illimitato, antispam e messenger integrato. http://it.mail.yahoo.com/