Thread: how do you write aggregate function
The help is not real clear nor can i find examples how to write an aggregate function.
I searched the archive of the mail list and did not find anything
I need to write Weighted Average function so the need to keep track of multiple variables between function calls is must?
I see how the
One person pointed out the pl/r plugin but what i'm working on gets tied into the PostBooks/OpenMfg project and they are not real big fans on adding another language and plugin they have to make sure their customers have deployed (which i'm one of those customers).
PostBooks/OpenMfg will want all the functions in pl/pgsql does this present any major problems??
I searched the archive of the mail list and did not find anything
I need to write Weighted Average function so the need to keep track of multiple variables between function calls is must?
I see how the
sfunc
( internal-state, next-data-values ) keeps track of the passed values with the internal state. Can the internal state be an array type where i just adding new entries to a multi-dimensional array then on ffunc do all the math in. One person pointed out the pl/r plugin but what i'm working on gets tied into the PostBooks/OpenMfg project and they are not real big fans on adding another language and plugin they have to make sure their customers have deployed (which i'm one of those customers).
PostBooks/OpenMfg will want all the functions in pl/pgsql does this present any major problems??
Justin wrote: > The help is not real clear nor can i find examples how to write an > aggregate function. Examples: http://www.postgresql.org/docs/8.2/static/xaggr.html > I searched the archive of the mail list and did not find anything The online documentation is excellent for these sorts of things. > I need to write Weighted Average function so the need to keep track of > multiple variables between function calls is must? You don't necessarily need an aggregate function to do this. A weighted average takes several independent variables, weights them based on some constant (usually the difference between a static time and the time at which the data were recorded), and returns a value [0]. Maintaining state between calls is probably going to be more trouble than it's worth, especially if you're recomputing the weights all the time... which, in most cases, is what happens. I perform exponential moving average analysis of stock market and trading data, for which I have a table that contains columns like the following (these data are not intended to resemble the performance of any particular security). id | time | price ----------------- 1 | 09:30 | 89.54 2 | 09:31 | 89.58 3 | 09:32 | 89.53 4 | 09:33 | 89.5 5 | 09:34 | 89.51 6 | 09:35 | 89.5 7 | 09:36 | 89.42 8 | 09:37 | 89.44 When I compute the exponential average of these data, I'm always looking at the "most recent" X prices, as I loop over all the rows in which I'm interested. Which means I need to recompute the weighted values for every minute of data (in the case of this sample table, anyway). Maintaining state for that sort of calculation wouldn't be worth the overhead. I suggest writing a function (in PL/pgSQL or whatever your favorite flavor is) that performs a query to retrieve all the rows you need and outputs a SETOF data that contains the weighted averages. If you only need one average at a time, just return a single value instead of a SETOF values. I hope this helps, but in case it doesn't, you should probably give us a little more detail about what you're actually trying to do. Colin [0] Using a formula like this: <http://en.wikipedia.org/wiki/Moving_average#Weighted_moving_average>
On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote:
I don't know if you've already seen this, but this is the question that I asked a while back:
http://archives.postgresql.org/pgsql-general/2007-12/msg00681.php
Regards,
Richard Broersma Jr.
I searched the archive of the mail list and did not find anything
I don't know if you've already seen this, but this is the question that I asked a while back:
http://archives.postgresql.org/pgsql-general/2007-12/msg00681.php
Regards,
Richard Broersma Jr.
Richard Broersma wrote: > On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote: > >> I searched the archive of the mail list and did not find anything Search the documentation. There are a couple great examples posted at http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html
Thank you i was looking for examples and never thought to look at the user comments. I searched the help files several times but never thought to look at the user comment help online. the help coming that comes with PGAdmin III does not have the user comments. once i get the aggregate function working i will add it to the comment section or post it back here. Sense it will be a more complicated aggregate function that what is listed here with sfunc and ffunc calls btober@ct.metrocast.net wrote: > Richard Broersma wrote: >> On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote: >> >>> I searched the archive of the mail list and did not find anything > > Search the documentation. There are a couple great examples posted at > > http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html > >
I got the aggregate function for weighted average done. I finely left alone for more than 10 minutes to actual get it written. It takes 2 value input Weight and the Value. it will sum the weighted entries to create the constant then does the normal formula, but does not percentage number but averaged number. A simple change on the return line it can do percentages. I did a couple of things a little odd . instead of doing a multi dimensional array i did a one dimensional array where the 1st row is Weight and the 2nd row is Value. This made the loop through the array look stupid. I tested it across 50,000 records with a group by it took 3.3 seconds to run. without the group by clause performance is terrible taking several minutes just to do the sfunc part. 371,563ms The Array seems to have performance hit any advice? It could be the way i'm appending to the Array which has a performance hit as the array gets bigger and bigger ? btober@ct.metrocast.net wrote: > Richard Broersma wrote: >> On Tue, Mar 4, 2008 at 8:44 AM, Justin <justin@emproshunts.com> wrote: >> >>> I searched the archive of the mail list and did not find anything > > Search the documentation. There are a couple great examples posted at > > http://www.postgresql.org/docs/8.0/interactive/sql-createaggregate.html > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Attachment
Array appends are usually a performance hit, as you said. I'm not sure though with PostgreSQL. Why not try it with two arrays and see what happens. At least you would reducing the single array and the eliminating the append. danap. > I got the aggregate function for weighted average done. I finely > left alone for more than 10 minutes to actual get it written. It > takes 2 value input Weight and the Value. it will sum the weighted > entries to create the constant then does the normal formula, but does > not percentage number but averaged number. A simple change on the > return line it can do percentages. > > I did a couple of things a little odd . instead of doing a multi > dimensional array i did a one dimensional array where the 1st row is > Weight and the 2nd row is Value. This made the loop through the > array look stupid. > I tested it across 50,000 records with a group by it took 3.3 seconds > to run. > > without the group by clause performance is terrible taking several > minutes just to do the sfunc part. 371,563ms > > The Array seems to have performance hit any advice? It could be the > way i'm appending to the Array which has a performance hit as the > array gets bigger and bigger ?
If i use two arrays how do i append new entries into these arrays. array_append only allows for 1 dimension. what other methods are there to add new entries to the array. dmp wrote: > Array appends are usually a performance hit, as you said. I'm not sure > though with > PostgreSQL. Why not try it with two arrays and see what happens. At > least you would > reducing the single array and the eliminating the append. > > danap. > >> I got the aggregate function for weighted average done. I finely >> left alone for more than 10 minutes to actual get it written. It >> takes 2 value input Weight and the Value. it will sum the weighted >> entries to create the constant then does the normal formula, but does >> not percentage number but averaged number. A simple change on the >> return line it can do percentages. >> >> I did a couple of things a little odd . instead of doing a multi >> dimensional array i did a one dimensional array where the 1st row is >> Weight and the 2nd row is Value. This made the loop through the >> array look stupid. >> I tested it across 50,000 records with a group by it took 3.3 seconds >> to run. >> >> without the group by clause performance is terrible taking several >> minutes just to do the sfunc part. 371,563ms >> >> The Array seems to have performance hit any advice? It could be >> the way i'm appending to the Array which has a performance hit as the >> array gets bigger and bigger ? > >
Justin, I'm really not a expert in this area, on how to code this, or functions in PostgreSQL. All I could offer is some ideas which you might try. Sometimes this is all it takes. Perhaps someone else will respond that might be more helpful. With that said I have read in the documentation the use of other languages and if you are looking for a higher performance, that is the way I would said its going to come about perhaps. danap. > DMP you did give me an idea on changing how to call the append array > sfunc looks like this > > create or replace function wcost_average_sf (numeric[], numeric, > numeric) > returns numeric[] as > $Body$ > begin > return array_append(array_append($1, $2), $3); > end; > $Body$ > LANGUAGE 'plpgsql' VOLATILE; > > this yanked out 140,000 ms on the run time, a big improvement but > no where i want it to be. > > are there speed improvements in the other languages TCL > > dmp wrote: > >> Array appends are usually a performance hit, as you said. I'm not >> sure though with >> PostgreSQL. Why not try it with two arrays and see what happens. At >> least you would >> reducing the single array and the eliminating the append. >> >> danap. >> >>> I got the aggregate function for weighted average done. I finely >>> left alone for more than 10 minutes to actual get it written. It >>> takes 2 value input Weight and the Value. it will sum the weighted >>> entries to create the constant then does the normal formula, but >>> does not percentage number but averaged number. A simple change on >>> the return line it can do percentages. >>> >>> I did a couple of things a little odd . instead of doing a multi >>> dimensional array i did a one dimensional array where the 1st row is >>> Weight and the 2nd row is Value. This made the loop through the >>> array look stupid. >>> I tested it across 50,000 records with a group by it took 3.3 >>> seconds to run. >>> >>> without the group by clause performance is terrible taking several >>> minutes just to do the sfunc part. 371,563ms >>> >>> The Array seems to have performance hit any advice? It could be >>> the way i'm appending to the Array which has a performance hit as >>> the array gets bigger and bigger ? >> >> >>
On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote: > I got the aggregate function for weighted average done. I finely left > alone for more than 10 minutes to actual get it written. It takes 2 > value input Weight and the Value. it will sum the weighted entries to > create the constant then does the normal formula, but does not > percentage number but averaged number. A simple change on the return > line it can do percentages. If you're looking for performance, ISTM the best option would be to simply accumulate the weights and value*weight as you go and do a division at the end. That seems likely to beat any implementation involving array_append. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
i wish that could work but need to keep track of the individual weights as its a percentage of the total amount of the weight.
I would never have thought the performance of the Array would suck this bad.
Martijn van Oosterhout wrote:
I would never have thought the performance of the Array would suck this bad.
Martijn van Oosterhout wrote:
On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote:I got the aggregate function for weighted average done. I finely left alone for more than 10 minutes to actual get it written. It takes 2 value input Weight and the Value. it will sum the weighted entries to create the constant then does the normal formula, but does not percentage number but averaged number. A simple change on the return line it can do percentages.If you're looking for performance, ISTM the best option would be to simply accumulate the weights and value*weight as you go and do a division at the end. That seems likely to beat any implementation involving array_append. Have a nice day,
>>A couple of email pointers: >> >>* Please don't use language like "the Array would suck this bad" in >>lists. I swear like a sailor in person, but it reads *very* badly in >>email. There are a lot of people on this list who might take offense, >>since it is their hard work that means that arrays work at all for >>you...That comment was not meant to be an insult or disparaging in any way what so ever. If it was taken as such then i'm sorry.
It seems the biggest performance hit is copying of the array content from one memory variable to another which is happening allot.
I'm not really against using a temp tables to hold onto values. I used to do that in Foxpro when i hit the hard limit on its array but other problems start popping up. If we use a temp table keeping track what going with other users can make life fun.
I really want to figure this out how to speed it up. I have to write allot more aggregate functions to analyze R&D data which will happen latter this year. right now this function will be used in calculating manufacturing cost.
Webb Sprague wrote:
A couple of email pointers: * Please don't use language like "the Array would suck this bad" in lists. I swear like a sailor in person, but it reads *very* badly in email. There are a lot of people on this list who might take offense, since it is their hard work that means that arrays work at all for you... * Please don't topquote. On Mon, Mar 10, 2008 at 8:01 AM, Justin <justin@emproshunts.com> wrote:i wish that could work but need to keep track of the individual weights as its a percentage of the total amount of the weight.The definition of "weighted mean" doesn't require that the weights be normalized. You probably need to calculate this separately, either using a temp table or a column in the original table.I would never have thought the performance of the Array would suck this bad.You should be using a temp table or a new column as above to store the normalized weights, and then calling the original aggregate on those. (a) calculate sum of weights (aggregate), (b) calculate normalized weight for each row (weight / total), (c) find weighted mean with aggregate that retains only the totals between function calls. If you let your arrays grow to the size of tables, performance will suffer (as they are not meant to do that anyway). I don't think you can avoid two passes, whether inside a function or outside.Martijn van Oosterhout wrote:On Sun, Mar 09, 2008 at 11:40:47AM -0500, Justin wrote: I got the aggregate function for weighted average done. I finely left alone for more than 10 minutes to actual get it written. It takes 2 value input Weight and the Value. it will sum the weighted entries to create the constant then does the normal formula, but does not percentage number but averaged number. A simple change on the return line it can do percentages. If you're looking for performance, ISTM the best option would be to simply accumulate the weights and value*weight as you go and do a division at the end. That seems likely to beat any implementation involving array_append. Have a nice day,
On Mon, Mar 10, 2008 at 11:00 AM, Justin <justin@emproshunts.com> wrote: > > That comment was not meant to be an insult or disparaging in any way what > so ever. If it was taken as such then i'm sorry. I am sure it would have been fine in person, I just think over email it sounded abrasive. But could you stop topquoting please? > It seems the biggest performance hit is copying of the array content from > one memory variable to another which is happening allot. Yeah, I think arrays just can't handle a whole lot of data, that is all. They are "tricky", and shouldn't be used for heavy lifting (more than 1k of elements feels like you are asking for trouble). > I'm not really against using a temp tables to hold onto values. I used to > do that in Foxpro when i hit the hard limit on its array but other problems > start popping up. If we use a temp table keeping track what going with > other users can make life fun. I think temp tables have scope, though you should test this, so that you can use them with impunity in functions and not worry with multiple users. > I really want to figure this out how to speed it up. I have to write allot > more aggregate functions to analyze R&D data which will happen latter this > year. right now this function will be used in calculating manufacturing > cost. I think a combination of aggregate functions along with some more design would be best. For example: can you have a trigger calculate the normalized weight of a row on insert? Have triggers keep another table with summary information updated as you modify the data? Etc. There is a lot to PG that would help for this kind of thing.
On Mon, Mar 10, 2008 at 10:01:47AM -0500, Justin wrote: > i wish that could work but need to keep track of the individual weights > as its a percentage of the total amount of the weight. Unless you have a different meaning of weighted average to me, I don't think you do. AFAIK this would produce exactly the same result as what your code does: state_function(state,weight,value) state[0] = state[0]+weight state[1] = state[1]+weight*value final_function(state) return state[1]/state[0] The state[0] is the same as your _sumedWeight. Because it's a constant in the second loop you can do the division after. state[1] is your _sumedWxV times _sumedWeight. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
thanks
it now takes 806 ms
Martijn van Oosterhout wrote:
it now takes 806 ms
Martijn van Oosterhout wrote:
On Mon, Mar 10, 2008 at 10:01:47AM -0500, Justin wrote:i wish that could work but need to keep track of the individual weights as its a percentage of the total amount of the weight.Unless you have a different meaning of weighted average to me, I don't think you do. AFAIK this would produce exactly the same result as what your code does: state_function(state,weight,value) state[0] = state[0]+weight state[1] = state[1]+weight*value final_function(state) return state[1]/state[0] The state[0] is the same as your _sumedWeight. Because it's a constant in the second loop you can do the division after. state[1] is your _sumedWxV times _sumedWeight. Have a nice day,