Thread: complex custom aggregate function

complex custom aggregate function

From

Scara Maccai

Date:

30 January 2009, 04:42:17

Hi all,

I have a table like:

value int,
quarter timestamp

I need an aggregate function that gives back the maximum "value" using
this algorithm:

AVG of the first hour (first 4 quarters) (AVG0)
same as above, but 1 quarter later (AVG1)
....
same as above, but n quarters later (AVGn)

result: the quarter where AVGn was MAX.

Example:

quarter        value        AVGn

2008-01-01 00:00     10
2008-01-01 00:15     15
2008-01-01 00:30     5
2008-01-01 00:45     20    -> 12.5 ((10+15+5+20)/4)
2008-01-01 01:15     2    -> 21    ((15+5+20+2)/4)
2008-01-01 01:30     30    -> 14.25 ((5+20+2+30)/4))

the result should be ('2008-01-01 00:15', 21)



It would be very easy if the input to the custom aggregate function was
ordered (because I would keep 4 internal counters), but I guess there's
no way of "forcing" the ordering of the input to the function, right?

So I have to cache all the (quarter,value) couples and give back a
result at the end, right?

Re: complex custom aggregate function

From

Gregory Stark

Date:

30 January 2009, 07:13:05

Scara Maccai <m_lists@yahoo.it> writes:

> It would be very easy if the input to the custom aggregate function was
> ordered (because I would keep 4 internal counters), but I guess there's
> no way of "forcing" the ordering of the input to the function, right?

You can with a subquery. Something like

 SELECT agg(foo) from (SELECT foo ORDER BY bar)

However that will produce one record per grouping. From what I read of your
description you want to produce one record per input record. There isn't any
efficient way to do that in current Postgres releases -- you would have to
have a subquery which executed for every record and retrieved the set of data
to aggregate.

8.4 Will have OLAP Window functions which can implement things like moving
averages.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: complex custom aggregate function

From

Scara Maccai

Date:

30 January 2009, 08:46:05

Gregory Stark wrote:
> From what I read of your
> description you want to produce one record per input record.

Exactly.

> 8.4 Will have OLAP Window functions which can implement things like
> moving averages.

Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving
backthe result? 

Thank you.

Re: complex custom aggregate function

From

"Marc Mamin"

Date:

30 January 2009, 16:43:35

Hello,

not very smart, but something like following should do the job:

h_m    v
1.00    x
1.25    x
1.50    x
1.75    x
2.00    x
2.25    x
2.50    x
2.75    x
3.00    x
3.25    x
3.50    x
3.75    x
4.00    x
4.25    x
4.50    x
4.75    x
...

select H,A
FROM
(
select min(h_q) as H, avg(x) as A
group by h_q/1
union all
select min(h_q), avg(x)
group by (h_q-0.25)/1
union all
select min(h_q), avg(x)
group by (h_q-0.50)/1
union all
select min(h_q), avg(x)
group by (h_q-0.75)/1
)foo
where A= select max(A) from (foo..)

or use ORDER BY A desc LIMIT 1 if a single result is sufficient...

HTH,

Marc Mamin

-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Scara Maccai
Sent: Fri 1/30/2009 1:45 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] complex custom aggregate function

Gregory Stark wrote:
> From what I read of your
> description you want to produce one record per input record.

Exactly.

> 8.4 Will have OLAP Window functions which can implement things like
> moving averages.

Using 8.3: could I do it caching all the values "somewhere" in a custom aggregation function to sort them before giving back the result?

Thank you.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: complex custom aggregate function

From

"Marc Mamin"

Date:

30 January 2009, 16:56:01

Oops,

I meant

group by floor(h_q-x)
instead of
group by (h_q-x)/1

Marc Mamin

> select H,A
> FROM
> (
> select min(h_q) as H, avg(x) as A
> group by h_q/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.25)/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.50)/1
> union all
> select min(h_q), avg(x)
> group by (h_q-0.75)/1
> )foo
> where A= select max(A) from (foo..)

R: complex custom aggregate function

From

"Paolo Saudin"

Date:

01 February 2009, 07:53:49

>-----Messaggio originale-----
>Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
>Inviato: venerdì 30 gennaio 2009 9.36
>A: pgsql-general@postgresql.org
>Oggetto: [GENERAL] complex custom aggregate function
>
>Hi all,
>
>I have a table like:
>
>value int,
>quarter timestamp
>
>I need an aggregate function that gives back the maximum "value" using
>this algorithm:
>
>AVG of the first hour (first 4 quarters) (AVG0)
>same as above, but 1 quarter later (AVG1)
>....
>same as above, but n quarters later (AVGn)
>
>result: the quarter where AVGn was MAX.
>
>Example:
>
>quarter        value        AVGn
>
>2008-01-01 00:00     10
>2008-01-01 00:15     15
>2008-01-01 00:30     5
>2008-01-01 00:45     20    -> 12.5 ((10+15+5+20)/4)
>2008-01-01 01:15     2    -> 21    ((15+5+20+2)/4)
>2008-01-01 01:30     30    -> 14.25 ((5+20+2+30)/4))
>
>the result should be ('2008-01-01 00:15', 21)
>
>
>
>It would be very easy if the input to the custom aggregate function was
>ordered (because I would keep 4 internal counters), but I guess there's
>no way of "forcing" the ordering of the input to the function, right?
>
>So I have to cache all the (quarter,value) couples and give back a
>result at the end, right?
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



For that purpose, a sliding mean calculation I use the following

--
-- Sample table definition
--
CREATE TABLE tbl_ayas
(
  fulldate timestamp without time zone NOT NULL,
  id_1 real, -- temperature
  id_2 real, -- pressure
  ..........
  CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

--
-- Function
--
CREATE OR REPLACE FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar)
  RETURNS real AS
$BODY$
#BEGIN { strict->import(); }

  # get values
  my ($myid, $myval, $mycount, $myvalid, $myslidesum, $myreset) = @_;

  # reset the arry if requested
  if ( $myreset eq 't' ) {
    @stored_sl_val=();
    @stored_arr=();
    return 0;
  }

  # restore the array of array
  @temp_sl_val = $stored_arr[$myid];
  @stored_sl_val = @{$temp_sl_val[0]};

  # check if the value is null
  if ( ! defined $myval ) {
    # log log log log log log
    elog(NOTICE, "perl_sliding_mean => push null value [undef]" );
    # sum does not change
    push(@stored_sl_val, undef);
  } else {
    # log log log log log log
    elog(NOTICE, "perl_sliding_mean => push value $myval" );
    # assign the new value
    push(@stored_sl_val, $myval);
  }

  # log log log log log log
  elog(NOTICE, "perl_sliding_mean => scalar array " . scalar @stored_sl_val );
  if ( ( scalar @stored_sl_val ) > $mycount ) {
  # log log log log log log
  elog(NOTICE, "perl_sliding_mean => pop element" );
  # Remove one element from the beginning of the array.
  shift(@stored_sl_val);
  }

  # getting mean
  # log log log log log log
  elog(NOTICE, "perl_sliding_mean => getting mean" );
  my $good_values;
  my $result;
  foreach (@stored_sl_val) {
  # log log log log log log
  elog(NOTICE, "arr : " . $_ );
  if ( defined $_ ) {
    $result += $_;
    $good_values ++;
  }
  }

  # log log log log log log
  elog(NOTICE, "perl_sliding_mean => sum : $result, good values : $good_values" );
  my $mean;
  if ( $good_values >= $myvalid ) {
    # reset the arry if requested
    if ( $myslidesum eq 't' ) {
      $mean = $result; # sum
    } else {
      $mean = $result / $good_values; # average
    }
  } else {
    # log log log log log log
    elog(NOTICE, "perl_sliding_mean => good_values < myvalid" );
    $mean = -99999999; # skip later and return null
  }

  # save back the array of array
  elog(NOTICE, "perl_sliding_mean => scalar stored_sl_val " . scalar @stored_sl_val );
  $stored_arr[$myid] = [ @stored_sl_val ];

  # return calculated sliding mean or null
  if ( $mean == -99999999 ) { return; }
  return $mean;

$BODY$
  LANGUAGE 'plperlu' VOLATILE;
COMMENT ON FUNCTION perl_sliding_mean(integer, real, integer, integer, bpchar, bpchar) IS 'Calculate sliding
means/sums';

--
-- query
--
Select perl_sliding_mean(0,0,0,0,'f','t');
SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_temperature",
     perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS "ayas_pressure"
    .....

Regards,
Paolo Saudin

Re: R: complex custom aggregate function

From

Scara Maccai

Date:

02 February 2009, 05:36:11

> Paolo Saudin wrote:
> For that purpose, a sliding mean calculation I use the following
>
> CREATE TABLE tbl_ayas
> (
>   fulldate timestamp without time zone NOT NULL,
>   id_1 real, -- temperature
>   id_2 real, -- pressure
>   ..........
>   CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
> ) WITH (OIDS=FALSE);
>
> [...]

> Select perl_sliding_mean(0,0,0,0,'f','t');
> SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
> "ayas_temperature",
>      perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
> "ayas_pressure"


I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
Thank you.

R: R: complex custom aggregate function

From

"Paolo Saudin"

Date:

02 February 2009, 08:31:14

>-----Messaggio originale-----
>Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Scara Maccai
>Inviato: lunedì 2 febbraio 2009 10.36
>A: Paolo Saudin; pgsql-general@postgresql.org
>Cc: pgsql-general
>Oggetto: Re: R: [GENERAL] complex custom aggregate function

> Paolo Saudin wrote:
> For that purpose, a sliding mean calculation I use the following
>
> CREATE TABLE tbl_ayas
> (
>   fulldate timestamp without time zone NOT NULL,
>   id_1 real, -- temperature
>   id_2 real, -- pressure
>   ..........
>   CONSTRAINT tbl_ayas_pkey PRIMARY KEY (fulldate)
> ) WITH (OIDS=FALSE);
>
> [...]

> Select perl_sliding_mean(0,0,0,0,'f','t');
> SELECT perl_sliding_mean(0," id_1 ", 8, 6, 'f', 'f') AS numeric), 1) AS
> "ayas_temperature",
>      perl_sliding_mean(1," id_2 ", 8, 6, 'f', 'f') AS numeric), 1) AS
> "ayas_pressure"


>I don't understand: how can you be sure that data is passed to the function ordered by "fulldate"?
>Thank you.

I use a master table with a "fulldate" field and filled with sequential dates to fill gaps when meteo data is missing.

CREATE TABLE master
(
  fulldate timestamp without time zone NOT NULL,
  CONSTRAINT master_pkey PRIMARY KEY (fulldate)
) WITH (OIDS=FALSE);

So the query will be:
SELECT
    fulldate, id_3 AS "ayas_temperature" ,
    round(cast(perl_sliding_mean(0,id_3, 8, 6, 'f', 'f') AS numeric), 3) AS "ayas_temperature_sliding"
FROM
    _master LEFT JOIN tables_ar.tbl_ayas USING(fulldate)
WHERE
    fulldate > '2009-01-01'
ORDER BY fulldate limit 16;

01/01/2009 1.00    -7
01/01/2009 2.00    -7,1
01/01/2009 3.00    -5,3
01/01/2009 4.00    -5,2
01/01/2009 5.00    -4,8
01/01/2009 6.00    -4
01/01/2009 7.00    -4,3
01/01/2009 8.00    -5,2    -5,363    ( mean from 01/01/2009 1.00 - 01/01/2009 8.00 )
01/01/2009 9.00    -5,4    -5,163    ...............................................
01/01/2009 10.00    -3    -4,65        ...............................................
01/01/2009 11.00    -0,4    -4,038    ...............................................
01/01/2009 12.00    0,4    -3,338    ...............................................
01/01/2009 13.00    -0,2    -2,763    ...............................................
01/01/2009 14.00    -1,8    -2,488    ...............................................
01/01/2009 15.00    -2,2    -2,225    ...............................................
01/01/2009 16.00    -2,6    -1,9        ( mean from 01/01/2009 9.00 - 01/01/2009 16.00 )

And all the sliding means are correct ( from the 8th value ahead)

Paolo Saudin

Re: R: R: complex custom aggregate function

From

Scara Maccai

Date:

02 February 2009, 10:30:48

Paolo Saudin wrote:
> I use a master table with a "fulldate" field and filled with sequential dates to
> fill gaps when meteo data is missing.

I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered
timestamps-data?I don't mean only in case of holes. 

The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order
ofthe data passed to the function... 
Suppose you have data like:
01/01/2009 1.00    -7
01/01/2009 2.00    -7,1
01/01/2009 3.00    -5,3
01/01/2009 4.00    -5,2
01/01/2009 5.00    -4,8
01/01/2009 6.00    -4



What does prevent postgresql to call your function with data in this order:
01/01/2009 6.00    -4
01/01/2009 1.00    -7
01/01/2009 5.00    -4,8
01/01/2009 3.00    -5,3
01/01/2009 2.00    -7,1
01/01/2009 4.00    -5,2

and reorder ("order by fulltime") at the very end (when results from the function are already out)?

Thank you for your patience.

Re: R: R: complex custom aggregate function

From

Greg Stark

Date:

02 February 2009, 11:19:15

On Mon, Feb 2, 2009 at 2:30 PM, Scara Maccai <m_lists@yahoo.it> wrote:
> Paolo Saudin wrote:
>> I use a master table with a "fulldate" field and filled with sequential dates to
>> fill gaps when meteo data is missing.
>
> I'm sorry, I still don't get it: how can you be sure that postgresql won't call perl_sliding_mean with not-ordered
timestamps-data?I don't mean only in case of holes. 
>
> The "order by" as far as I know is usually done at the very end of the plan, so I don't think it can affect the order
ofthe data passed to the function... 


You need to make a subquery with the ORDER BY on it. Postgres won't
re-order an ORDER BY in a subquery to happen outside the outer query.

So something like

select perl_function(foo) from (select foo from table order by bar)

--
greg

Re: complex custom aggregate function

From

Scara Maccai

Date:

09 February 2009, 05:16:34

I think I've found a solution myself to the moving average problem, so I'm posting it here in case it works for some
others(and so that everybody can check that I'm doing it right and in the "best" way...) 
Basically I'm
1) saving all the couples (timestamp, double) of an aggregation into an array using array_accum
2) unnesting the same array to get a table of (timestamp, double) tuples
3) calculating the moving average using the "self join" trick (found in "Transact SQL") of the table got from 2)
4) getting the max out of the average values.
Before the code, some questions:
1) I tried doing it using python, but the array was passed as a string (I didn't find it in the docs though...)
2) I didn't understand why the "array_enum" function is present only for int arrays in the contrib "intagg" module;
couldn'tit be a  function with "anyarray"  as argument? Such as: 
CREATE OR REPLACE FUNCTION array_enum(anyarray)
RETURNS setof anyelement
AS 'array_unnest'
LANGUAGE INTERNAL IMMUTABLE STRICT;


3) Am I doing something wrong in the code below? Could I do it better?
CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);
CREATE TYPE timemax_t AS (
    t       timestamp,
    v       double precision
);
CREATE OR REPLACE FUNCTION timemax_array_enum(timemax_t[])
RETURNS setof timemax_t
AS 'array_unnest'
LANGUAGE INTERNAL IMMUTABLE STRICT;


CREATE or replace FUNCTION movingavgmax(a timemax_t[], nquarters int4)
  RETURNS double precision
AS $$
DECLARE
    timemax_t_row timemax_t;
    retval    double precision;
BEGIN
select max(mx) into retval from
(
 select x..t, AVG(y.v) as mx
 from timemax_array_enum(a) as x, timemax_array_enum(a) as y
 where
 x.t between y.t and y.t+((nquarters-1)*15||' minutes')::interval
 group by x.t
 having count(y.v)=nquarters
 ) as subs;
return retval;
END;
$$ LANGUAGE plpgsql;


-- example usage:
select movingavgmax(array_accum((quarter,value)::timemax_t), 3) from test where id = 10 AND quarter between '2008-12-01
00:00:00'and '2008-12-01 10:00:00' ; 

----- Messaggio originale -----
> Da: Scara Maccai <m_lists@yahoo.it>
> A: pgsql-general@postgresql.org
> Inviato: Venerdì 30 gennaio 2009, 9:35:53
> Oggetto: complex custom aggregate function
>
> Hi all,
>
> I have a table like:
>
> value int,
> quarter timestamp
>
> I need an aggregate function that gives back the maximum "value" using
> this algorithm:
>
> AVG of the first hour (first 4 quarters) (AVG0)
> same as above, but 1 quarter later (AVG1)
> ...
> same as above, but n quarters later (AVGn)
>
> result: the quarter where AVGn was MAX.
>
> Example:
>
> quarter        value        AVGn
>
> 2008-01-01 00:00     10
> 2008-01-01 00:15     15
> 2008-01-01 00:30     5
> 2008-01-01 00:45     20    -> 12.5 ((10+15+5+20)/4)
> 2008-01-01 01:15     2    -> 21    ((15+5+20+2)/4)
> 2008-01-01 01:30     30    -> 14.25 ((5+20+2+30)/4))
>
> the result should be ('2008-01-01 00:15', 21)
>
>
>
> It would be very easy if the input to the custom aggregate function was
> ordered (because I would keep 4 internal counters), but I guess there's
> no way of "forcing" the ordering of the input to the function, right?
>
> So I have to cache all the (quarter,value) couples and give back a
> result at the end, right?



      Passa a Yahoo! Mail.

La webmail che ti offre GRATIS spazio illimitato,
antispam e messenger integrato.
http://it.mail.yahoo.com/