Thread: how do you write aggregate function

how do you write aggregate function

From

Justin

Date:

04 March 2008, 12:48:43

The help is not real clear nor can i find examples how to write an aggregate function.

I searched the archive of the mail list and did not find anything

I need to write Weighted Average function so the need to keep track of multiple variables between function calls is must?

I see how the sfunc( internal-state, next-data-values ) keeps track of the passed values with the internal state. Can the internal state be an array type where i just adding new entries to a multi-dimensional array then on ffunc do all the math in.

One person pointed out the pl/r plugin but what i'm working on gets tied into the PostBooks/OpenMfg project and they are not real big fans on adding another language and plugin they have to make sure their customers have deployed (which i'm one of those customers).

PostBooks/OpenMfg will want all the functions in pl/pgsql does this present any major problems??

Re: how do you write aggregate function

From

Colin Wetherbee

Date:

04 March 2008, 13:18:20

Justin wrote:
> The help is not real clear nor can i find examples how to write an
> aggregate function.

Examples:

http://www.postgresql.org/docs/8.2/static/xaggr.html

> I searched the archive of the mail list and did not find anything

The online documentation is excellent for these sorts of things.

> I need to write Weighted Average function so the need to keep track of
> multiple variables between function calls is must?

You don't necessarily need an aggregate function to do this.

A weighted average takes several independent variables, weights them
based on some constant (usually the difference between a static time and
the time at which the data were recorded), and returns a value [0].
Maintaining state between calls is probably going to be more trouble
than it's worth, especially if you're recomputing the weights all the
time... which, in most cases, is what happens.

I perform exponential moving average analysis of stock market and
trading data, for which I have a table that contains columns like the
following (these data are not intended to resemble the performance of
any particular security).

id | time  | price
-----------------
  1 | 09:30 | 89.54
  2 | 09:31 | 89.58
  3 | 09:32 | 89.53
  4 | 09:33 | 89.5
  5 | 09:34 | 89.51
  6 | 09:35 | 89.5
  7 | 09:36 | 89.42
  8 | 09:37 | 89.44

When I compute the exponential average of these data, I'm always looking
at the "most recent" X prices, as I loop over all the rows in which I'm
interested.  Which means I need to recompute the weighted values for
every minute of data (in the case of this sample table, anyway).
Maintaining state for that sort of calculation wouldn't be worth the
overhead.

I suggest writing a function (in PL/pgSQL or whatever your favorite
flavor is) that performs a query to retrieve all the rows you need and
outputs a SETOF data that contains the weighted averages.  If you only
need one average at a time, just return a single value instead of a
SETOF values.

I hope this helps, but in case it doesn't, you should probably give us a
little more detail about what you're actually trying to do.

Colin

[0] Using a formula like this:
<http://en.wikipedia.org/wiki/Moving_average#Weighted_moving_average>

Re: how do you write aggregate function

From

"Richard Broersma"

Date:

05 March 2008, 00:47:46

On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote:

I searched the archive of the mail list and did not find anything

I don't know if you've already seen this, but this is the question that I asked a while back:
http://archives.postgresql.org/pgsql-general/2007-12/msg00681.php

Regards,
Richard Broersma Jr.

Re: how do you write aggregate function

From

btober@ct.metrocast.net

Date:

05 March 2008, 09:30:08

Richard Broersma wrote:
> On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote:
>
>> I searched the archive of the mail list and did not find anything

Search the documentation. There are a couple great examples posted at

http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html

Re: how do you write aggregate function

From

Justin

Date:

05 March 2008, 11:53:57

Thank you i was looking for examples and never thought to look at the
user comments.  I searched the help files several times but never
thought to look at the user comment help online.

the help coming that comes with PGAdmin III does not have the user
comments.  once i get the aggregate function working i will add it to
the comment section or post it back here.  Sense it will be a more
complicated aggregate function that what is listed here with sfunc and
ffunc calls

btober@ct.metrocast.net wrote:
> Richard Broersma wrote:
>> On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote:
>>
>>> I searched the archive of the mail list and did not find anything
>
> Search the documentation. There are a couple great examples posted at
>
> http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html
>
>

Re: how do you write aggregate function

From

Justin

Date:

09 March 2008, 13:44:49

I got the aggregate function for weighted average done.   I finely left
alone for more than 10 minutes  to actual get it written.  It takes 2
value input  Weight and the Value.  it will sum the weighted entries to
create the constant then does the normal formula, but does not
percentage number but averaged number.  A simple change on the return
line it can do percentages.

I did a couple of  things a little odd .  instead of doing a multi
dimensional array i did a one dimensional array where the 1st row is
Weight and the  2nd row is Value.   This made the loop through the array
look stupid.

I tested it across 50,000 records with a group by it took 3.3 seconds to
run.

without the group by clause performance is terrible taking several
minutes just to do the sfunc part. 371,563ms

The Array seems to have performance hit any advice?    It could be the
way i'm appending to the Array which has a performance hit as the array
gets bigger and bigger ?

btober@ct.metrocast.net wrote:
> Richard Broersma wrote:
>> On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote:
>>
>>> I searched the archive of the mail list and did not find anything
>
> Search the documentation. There are a couple great examples posted at
>
> http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Attachment

WieghtedAverage.sql

Re: how do you write aggregate function

From

dmp

Date:

09 March 2008, 15:14:07

Array appends are usually a performance hit, as you said. I'm not sure
though with
PostgreSQL. Why not try it with two arrays and see what happens. At
least you would
reducing the single array and the eliminating the append.

danap.

> I got the aggregate function for weighted average done.   I finely
> left alone for more than 10 minutes  to actual get it written.  It
> takes 2 value input  Weight and the Value.  it will sum the weighted
> entries to create the constant then does the normal formula, but does
> not percentage number but averaged number.  A simple change on the
> return line it can do percentages.
>
> I did a couple of  things a little odd .  instead of doing a multi
> dimensional array i did a one dimensional array where the 1st row is
> Weight and the  2nd row is Value.   This made the loop through the
> array look stupid.
> I tested it across 50,000 records with a group by it took 3.3 seconds
> to run.
>
> without the group by clause performance is terrible taking several
> minutes just to do the sfunc part. 371,563ms
>
> The Array seems to have performance hit any advice?    It could be the
> way i'm appending to the Array which has a performance hit as the
> array gets bigger and bigger ?

Re: how do you write aggregate function

From

Justin

Date:

09 March 2008, 15:47:31

If i use two arrays how do i append new entries into these arrays.
array_append only allows for 1 dimension.

what other methods are there to add new entries to the array.

dmp wrote:
> Array appends are usually a performance hit, as you said. I'm not sure
> though with
> PostgreSQL. Why not try it with two arrays and see what happens. At
> least you would
> reducing the single array and the eliminating the append.
>
> danap.
>
>> I got the aggregate function for weighted average done.   I finely
>> left alone for more than 10 minutes  to actual get it written.  It
>> takes 2 value input  Weight and the Value.  it will sum the weighted
>> entries to create the constant then does the normal formula, but does
>> not percentage number but averaged number.  A simple change on the
>> return line it can do percentages.
>>
>> I did a couple of  things a little odd .  instead of doing a multi
>> dimensional array i did a one dimensional array where the 1st row is
>> Weight and the  2nd row is Value.   This made the loop through the
>> array look stupid.
>> I tested it across 50,000 records with a group by it took 3.3 seconds
>> to run.
>>
>> without the group by clause performance is terrible taking several
>> minutes just to do the sfunc part. 371,563ms
>>
>> The Array seems to have performance hit any advice?    It could be
>> the way i'm appending to the Array which has a performance hit as the
>> array gets bigger and bigger ?
>
>

Re: how do you write aggregate function

From

dmp

Date:

09 March 2008, 16:23:37

Justin,
I'm really not a expert in this area, on how to code this, or functions in
PostgreSQL. All I could offer is some ideas which you might try. Sometimes
this is all it takes. Perhaps someone else will respond that might be more
helpful. With that said I have read in the documentation the use of other
languages and if you are looking for a higher performance, that is the
way I would said its going to come about perhaps.

danap.

> DMP   you did give me an idea on changing how to call the append array
> sfunc looks like this
>
> create or replace function wcost_average_sf (numeric[],  numeric,
> numeric)
> returns numeric[] as
> $Body$
>    begin
>        return array_append(array_append($1, $2), $3);
>    end;
> $Body$
> LANGUAGE 'plpgsql' VOLATILE;
>
> this yanked out 140,000 ms   on the run time,  a big improvement but
> no where i want it to be.
>
> are there speed improvements in the other languages TCL
>
> dmp wrote:
>
>> Array appends are usually a performance hit, as you said. I'm not
>> sure though with
>> PostgreSQL. Why not try it with two arrays and see what happens. At
>> least you would
>> reducing the single array and the eliminating the append.
>>
>> danap.
>>
>>> I got the aggregate function for weighted average done.   I finely
>>> left alone for more than 10 minutes  to actual get it written.  It
>>> takes 2 value input  Weight and the Value.  it will sum the weighted
>>> entries to create the constant then does the normal formula, but
>>> does not percentage number but averaged number.  A simple change on
>>> the return line it can do percentages.
>>>
>>> I did a couple of  things a little odd .  instead of doing a multi
>>> dimensional array i did a one dimensional array where the 1st row is
>>> Weight and the  2nd row is Value.   This made the loop through the
>>> array look stupid.
>>> I tested it across 50,000 records with a group by it took 3.3
>>> seconds to run.
>>>
>>> without the group by clause performance is terrible taking several
>>> minutes just to do the sfunc part. 371,563ms
>>>
>>> The Array seems to have performance hit any advice?    It could be
>>> the way i'm appending to the Array which has a performance hit as
>>> the array gets bigger and bigger ?
>>
>>
>>

Re: how do you write aggregate function

From

Martijn van Oosterhout

Date:

10 March 2008, 04:47:22

On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote:
> I got the aggregate function for weighted average done.   I finely left
> alone for more than 10 minutes  to actual get it written.  It takes 2
> value input  Weight and the Value.  it will sum the weighted entries to
> create the constant then does the normal formula, but does not
> percentage number but averaged number.  A simple change on the return
> line it can do percentages.

If you're looking for performance, ISTM the best option would be to
simply accumulate the weights and value*weight as you go and do a
division at the end. That seems likely to beat any implementation
involving array_append.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: how do you write aggregate function

From

Justin

Date:

10 March 2008, 12:06:11

i wish that could work but need to keep track of the individual weights as its a percentage of the total amount of the weight.

I would never have thought the performance of the Array would suck this bad.

Martijn van Oosterhout wrote:

On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote:

I got the aggregate function for weighted average done.   I finely left 
alone for more than 10 minutes  to actual get it written.  It takes 2 
value input  Weight and the Value.  it will sum the weighted entries to 
create the constant then does the normal formula, but does not 
percentage number but averaged number.  A simple change on the return 
line it can do percentages.

If you're looking for performance, ISTM the best option would be to
simply accumulate the weights and value*weight as you go and do a
division at the end. That seems likely to beat any implementation
involving array_append.

Have a nice day,

Re: how do you write aggregate function

From

Justin

Date:

10 March 2008, 15:08:01

>>A couple of email pointers:
>>
>>*  Please don't use  language like "the Array would suck this bad" in
>>lists.  I swear like a sailor in person, but it reads *very* badly in
>>email.  There are a lot of people on this list who might take offense,
>>since it is their hard work that means that arrays work at all for
>>you...

That comment was not meant to be an insult or disparaging in any way what so ever. If it was taken as such then i'm sorry.

It seems the biggest performance hit is copying of the array content from one memory variable to another which is happening allot.

I'm not really against using a temp tables to hold onto values. I used to do that in Foxpro when i hit the hard limit on its array but other problems start popping up. If we use a temp table keeping track what going with other users can make life fun.

I really want to figure this out how to speed it up. I have to write allot more aggregate functions to analyze R&D data which will happen latter this year. right now this function will be used in calculating manufacturing cost.

Webb Sprague wrote:

A couple of email pointers:

*  Please don't use  language like "the Array would suck this bad" in
lists.  I swear like a sailor in person, but it reads *very* badly in
email.  There are a lot of people on this list who might take offense,
since it is their hard work that means that arrays work at all for
you...

* Please don't topquote.

On Mon, Mar 10, 2008 at 8:01 AM, Justin <justin@emproshunts.com> wrote:

 i wish that could work  but need to keep track of the individual weights as
its a percentage of the total amount of the weight.

The definition of "weighted mean" doesn't require that the weights be
normalized.  You probably need to calculate this separately, either
using a temp table or a column in the original table.

 I would never have thought the performance of the Array would suck this
bad.

You should be using a temp table or a new column as above to store the
normalized weights, and then calling the original aggregate on those.
(a) calculate sum of weights (aggregate), (b) calculate normalized
weight for each row (weight / total), (c) find weighted mean with
aggregate that retains only the totals between function calls.  If you
let your arrays grow to the size of tables, performance will suffer
(as they are not meant to do that anyway).  I don't think you can
avoid two passes, whether inside a function or outside.

 Martijn van Oosterhout wrote:On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote:

I got the aggregate function for weighted average done. I finely left
alone for more than 10 minutes to actual get it written. It takes 2
value input Weight and the Value. it will sum the weighted entries to
create the constant then does the normal formula, but does not
percentage number but averaged number. A simple change on the return
line it can do percentages.
If you're looking for performance, ISTM the best option would be to
simply accumulate the weights and value*weight as you go and do a
division at the end. That seems likely to beat any implementation
involving array_append.

Have a nice day,

Re: how do you write aggregate function

From

"Webb Sprague"

Date:

10 March 2008, 15:13:03

On Mon, Mar 10, 2008 at 11:00 AM, Justin <justin@emproshunts.com> wrote:
>
>  That comment was not meant to be an insult or disparaging in any way what
> so ever.  If it was taken as such then i'm sorry.

I am sure it would have been fine in person, I just think over email
it sounded abrasive.

But could you stop topquoting please?

>  It seems the biggest performance hit is copying of the array content from
> one memory variable to another which is happening allot.

Yeah, I think arrays just can't handle a whole lot of data, that is
all.  They are "tricky", and shouldn't be used for heavy lifting (more
than 1k of elements feels like you are asking for trouble).

>  I'm not really against using a temp tables to hold  onto values.  I used to
> do that in Foxpro when i hit the hard limit on its array but other problems
> start popping up.  If we use a temp table keeping track what going with
> other users can make life fun.

I think temp tables have scope, though you should test this, so that
you can use them with impunity in functions and not worry with
multiple users.

>  I really want to figure this out how to speed it up.  I have to write allot
> more aggregate functions to analyze R&D data which will happen latter this
> year.   right now this function will be used in calculating manufacturing
> cost.

I think a combination of aggregate functions along with some more
design would be best.  For example:  can you have a trigger calculate
the normalized weight of a row on insert?  Have triggers keep another
table with summary information updated as you modify the data?  Etc.
There is a lot to PG that would help for this kind of thing.

Re: how do you write aggregate function

From

Martijn van Oosterhout

Date:

11 March 2008, 05:04:38

On Mon, Mar 10, 2008 at 10:01:47AM -0500, Justin wrote:
> i wish that could work  but need to keep track of the individual weights
> as its a percentage of the total amount of the weight.

Unless you have a different meaning of weighted average to me, I don't
think you do. AFAIK this would produce exactly the same result as what
your code does:

state_function(state,weight,value)
  state[0] = state[0]+weight
  state[1] = state[1]+weight*value

final_function(state)
  return state[1]/state[0]

The state[0] is the same as your _sumedWeight. Because it's a constant in
the second loop you can do the division after. state[1] is your
_sumedWxV times _sumedWeight.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

signature.asc

Re: how do you write aggregate function

From

Justin

Date:

11 March 2008, 11:16:36

thanks

it now takes 806 ms

Martijn van Oosterhout wrote:

On Mon, Mar 10, 2008 at 10:01:47AM -0500, Justin wrote:

i wish that could work  but need to keep track of the individual weights 
as its a percentage of the total amount of the weight.

Unless you have a different meaning of weighted average to me, I don't
think you do. AFAIK this would produce exactly the same result as what
your code does:

state_function(state,weight,value) state[0] = state[0]+weight state[1] = state[1]+weight*value

final_function(state) return state[1]/state[0]

The state[0] is the same as your _sumedWeight. Because it's a constant in
the second loop you can do the division after. state[1] is your
_sumedWxV times _sumedWeight.

Have a nice day,