Thread: Calculating a moving average
Why not use a cursor?
P.S.
A moving average will look much better if you Hahn the tails.
To do a normal 7 point moving average, you take (x[i]+ x[i+1]+ x[i+2]+ x[i+3]+ x[i+4]+ x[i+5]+ x[i+6])/7 as point xprime[i] and (y[i]+ y[i+1]+ y[i+2]+ y[i+3]+ y[i+4]+ y[i+5]+ y[i+6])/7 as point yprime[i].
But when you start let x[0], y[0] be the first point, and then average the next 2 for the second point… until you get to 7 and then do the same thing in the other end. Otherwise, you get a very strange looking curve.
IMO-YMMV
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Vanole, Mike
Sent: Wednesday, January 19, 2005 1:34 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Calculating a moving average
Hi,
I need to calculate a moving average and I would like to do it with SQL, or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible in Pg without a bunch of self joins, or is there a funtion available?
Thanks,
Mike
Also, if you don't need an exact moving average, you might consider a weighted mean. Something like: mean = mean * 0.9 + new_value * 0.1 Much easier to maintain than a moving average. On Thu, Jan 20, 2005 at 08:40:24PM -0800, Dann Corbit wrote: > Why not use a cursor? > > > > P.S. > > A moving average will look much better if you Hahn the tails. > > > To do a normal 7 point moving average, you take (x[i]+ x[i+1]+ x[i+2]+ > x[i+3]+ x[i+4]+ x[i+5]+ x[i+6])/7 as point xprime[i] and (y[i]+ y[i+1]+ > y[i+2]+ y[i+3]+ y[i+4]+ y[i+5]+ y[i+6])/7 as point yprime[i]. > > > > But when you start let x[0], y[0] be the first point, and then average > the next 2 for the second point... until you get to 7 and then do the > same thing in the other end. Otherwise, you get a very strange looking > curve. > > > > IMO-YMMV > > ________________________________ > > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Vanole, Mike > Sent: Wednesday, January 19, 2005 1:34 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Calculating a moving average > > > > Hi, > > > > I need to calculate a moving average and I would like to do it with SQL, > or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible > in Pg without a bunch of self joins, or is there a funtion available? > > > > Thanks, > > Mike > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Vanole, Mike" <Mike.Vanole@cingular.com> writes: > I need to calculate a moving average and I would like to do it with SQL, > or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible > in Pg without a bunch of self joins, or is there a funtion available? Unfortunately moving averages fall into a class of functions, called analytic functions (at least that's what Oracle calls them) that are inherently hard to model efficiently in SQL. Postgres doesn't have any special support for this set of functions, so you're stuck doing it the inefficient ways that standard SQL allows. I think this is even hard to implement correctly using Postgres's extremely extensible function support. Even if you implemented it in Perl or Python I don't think there's any way to allocate a temporary static storage area for a given call site. So your moving average function would behave strangely if you called it twice in a given query. But if you can work within that caveat it should be straightforward to implement it efficiently in Perl or Python. Alternatively you can write a plpgsql function to calculate the specific moving average you need that does the select itself. -- greg
On Fri, Jan 21, 2005 at 12:53:45AM -0500, Greg Stark wrote: > "Vanole, Mike" <Mike.Vanole@cingular.com> writes: > > > I need to calculate a moving average and I would like to do it with SQL, > > or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible > > in Pg without a bunch of self joins, or is there a funtion available? > > Unfortunately moving averages fall into a class of functions, called analytic > functions (at least that's what Oracle calls them) that are inherently hard to > model efficiently in SQL. Postgres doesn't have any special support for this > set of functions, so you're stuck doing it the inefficient ways that standard > SQL allows. > > I think this is even hard to implement correctly using Postgres's extremely > extensible function support. Even if you implemented it in Perl or Python I > don't think there's any way to allocate a temporary static storage area for a > given call site. So your moving average function would behave strangely if you > called it twice in a given query. > > But if you can work within that caveat it should be straightforward to > implement it efficiently in Perl or Python. Alternatively you can write a > plpgsql function to calculate the specific moving average you need that does > the select itself. If you're feeling adventurous, you might look at Oracle's documentation on their analytic functions and see if you can come up with something generic for PostgreSQL. Even if you only do a moving average function it would be a good start. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > If you're feeling adventurous, you might look at Oracle's documentation > on their analytic functions and see if you can come up with something > generic for PostgreSQL. I think the hard part of doing even a simple implementation is precisely the point I raised about doing it in Perl or Python. Somehow you have to allocate a static storage area specific to the call site. It's sort of like an aggregate function call except of course that you're going to return a datum for every record. For a fuller implementation there are a lot more details. If I understand correctly in Oracle you get to specify an ORDER BY clause and the equivalent of a GROUP BY clause in the analytic function call. I think each call site can even have its own order and grouping. > Even if you only do a moving average function it would be a good start. Actually my pet one would be a "rank" function. So you could do something like "return the top 3 scoring players from each team". Currently the suggested way to do it is by using an aggregate function to gather up the data in an array. -- greg
If someone wanted to put arbitrary aggregates into PostgreSQL, I would suggest something akin to the "RED BRICK" API, or better yet, the ATLAS API: http://magna.cs.ucla.edu/atlas/
"Dann Corbit" <DCorbit@connx.com> writes: > If someone wanted to put arbitrary aggregates into PostgreSQL, I would > suggest something akin to the "RED BRICK" API, or better yet, the ATLAS > API: I also found a good reference for the DB2's SQL2003 Standard OLAP functions: http://publib.boulder.ibm.com/infocenter/rb63help/topic/com.ibm.redbrick.doc6.3/sqlrg/sqlrg36.htm -- greg
Unless I'm grossly misunderstanding the problem i think that a trigger written in PL/pgsql would work fine. Something like this: CREATE TABLE foo ( foo_id SERIAL primary key, foo TEXT); CREATE TABLE bar ( foo_id INTEGER references foo, bar_id SERIAL primary key, bar DOUBLE PRECISION NOT NULL); CREATE TABLE bar_avg ( foo_id INTEGER primary key references foo, bar_avg DOUBLE PRECISION); CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS ' DECLARE bar_record RECORD; x INTEGER; y DOUBLE PRECISION := 0; BEGIN IF TG_OP = ''INSERT'' THEN y := y + NEW.bar; x := 1; FOR bar_record IN SELECT * FROM bar LOOP IF NEW.foo_id = bar_record.foo_id THEN y := y + bar_record.bar; x := x + 1; END IF; END LOOP; y := y/x; IF EXISTS(SELECT * FROM bar_avg WHERE foo_id = NEW.foo_id) THEN UPDATE bar_avg SET bar_avg.bar_avg = y WHERE foo_id = NEW.foo_id; ELSE INSERT INTO bar_avg VALUES (NEW.foo_id, y); END IF; RETURN NEW; ELSIF TG_OP = ''DELETE'' THEN x := 0; FOR bar_record IN SELECT * FROM bar LOOP IF OLD.foo_id = bar_record.foo_id AND OLD.bar_id <> bar_record.bar_id THEN y := y + bar_record.bar; x := x + 1; END IF; END LOOP; IF x <> 0 THEN y := y/x; END IF; UPDATE bar_avg SET bar_avg.bar_avg = y WHERE foo_id = OLD.foo_id; RETURN OLD; ELSE y := y + NEW.bar; x := 1; FOR bar_record IN SELECT * FROM bar LOOP IF OLD.bar_id <> bar_record.bar_id THEN y := y + bar_record.bar; x := x + 1; END IF; END LOOP; y := y/x; UPDATE bar_avg SET bar_avg.bar_avg = y WHERE foo_id = OLD.foo_id; RETURN NEW; END IF; END; ' LANGUAGE plpgsql; CREATE TRIGGER get_bar_avg BEFORE INSERT OR DELETE OR UPDATE ON bar FOR EACH ROW EXECUTE PROCEDURE get_bar_avg(); That should work, regards, matt Quoting "Jim C. Nasby" <decibel@decibel.org>: > On Fri, Jan 21, 2005 at 12:53:45AM -0500, Greg Stark wrote: > > "Vanole, Mike" <Mike.Vanole@cingular.com> writes: > > > > > I need to calculate a moving average and I would like to do it with > SQL, > > > or a Pg function built for this purpose. I'm on Pg 7.4. Is this > possible > > > in Pg without a bunch of self joins, or is there a funtion available? > > > > Unfortunately moving averages fall into a class of functions, called > analytic > > functions (at least that's what Oracle calls them) that are inherently hard > to > > model efficiently in SQL. Postgres doesn't have any special support for > this > > set of functions, so you're stuck doing it the inefficient ways that > standard > > SQL allows. > > > > I think this is even hard to implement correctly using Postgres's > extremely > > extensible function support. Even if you implemented it in Perl or Python > I > > don't think there's any way to allocate a temporary static storage area for > a > > given call site. So your moving average function would behave strangely if > you > > called it twice in a given query. > > > > But if you can work within that caveat it should be straightforward to > > implement it efficiently in Perl or Python. Alternatively you can write a > > plpgsql function to calculate the specific moving average you need that > does > > the select itself. > > If you're feeling adventurous, you might look at Oracle's documentation > on their analytic functions and see if you can come up with something > generic for PostgreSQL. Even if you only do a moving average function it > would be a good start. > -- > Jim C. Nasby, Database Consultant decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
mstory@uchicago.edu wrote: > CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS ' > DECLARE > bar_record RECORD; > x INTEGER; > y DOUBLE PRECISION := 0; > BEGIN > IF TG_OP = ''INSERT'' THEN > y := y + NEW.bar; ... > RETURN NEW; > ELSIF TG_OP = ''DELETE'' THEN > x := 0; ... > RETURN OLD; > ELSE > y := y + NEW.bar; ... > RETURN NEW; > END IF; > END; > ' LANGUAGE plpgsql; I see people do this from time to time. Just out of curiosity, is this considered good coding style, or is it considered "lazyness"? I'm not sure what to think of it. If I would have written this, there would have been 3 triggers w/o the check on TG_OP. Is there an important drawback to doing so? Is there any document on "preferred" coding style in PL/PGSQL? Yes, I'm a bit of a purist... -- Regards, Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: alban@magproductions.nl W: http://www.magproductions.nl
On Mon, 24 Jan 2005 08:32 pm, Alban Hertroys wrote: > mstory@uchicago.edu wrote: > > CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS ' > > DECLARE > > bar_record RECORD; > > x INTEGER; > > y DOUBLE PRECISION := 0; > > BEGIN > > IF TG_OP = ''INSERT'' THEN > > y := y + NEW.bar; > ... > > RETURN NEW; > > ELSIF TG_OP = ''DELETE'' THEN > > x := 0; > ... > > RETURN OLD; > > ELSE > > y := y + NEW.bar; > ... > > RETURN NEW; > > END IF; > > END; > > ' LANGUAGE plpgsql; > > I see people do this from time to time. Just out of curiosity, is this > considered good coding style, or is it considered "lazyness"? I'm not > sure what to think of it. > > If I would have written this, there would have been 3 triggers w/o the > check on TG_OP. Is there an important drawback to doing so? Is there any > document on "preferred" coding style in PL/PGSQL? > > Yes, I'm a bit of a purist... > Given you have to define a function for each trigger, my view is why write more functions. Along with this. As a C programmer, I would do a few more IF tests in a function, rather than write another one. I find that triggers like this are one functional block and all go together. Then when you update the function, it's all in one place. Others may have "better" reasons for why they do it the way they do. But they are mine. Regards Russell Smith.
I personally use 3 seperate triggers on most occasions, depending on how different the action for each seperate action is, it's just easier for me and my people to logically distinguish the functions that way, but the example in the 7.4 documentation for triggers is given using the form that i wrote this function in, and is useful when the trigger procedures for all 3 actions do roughly the same thing, it also makes editing the code a little easier in this situation. regards, matt Quoting Russell Smith <mr-russ@pws.com.au>: > On Mon, 24 Jan 2005 08:32 pm, Alban Hertroys wrote: > > mstory@uchicago.edu wrote: > > > CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS ' > > > DECLARE > > > bar_record RECORD; > > > x INTEGER; > > > y DOUBLE PRECISION := 0; > > > BEGIN > > > IF TG_OP = ''INSERT'' THEN > > > y := y + NEW.bar; > > ... > > > RETURN NEW; > > > ELSIF TG_OP = ''DELETE'' THEN > > > x := 0; > > ... > > > RETURN OLD; > > > ELSE > > > y := y + NEW.bar; > > ... > > > RETURN NEW; > > > END IF; > > > END; > > > ' LANGUAGE plpgsql; > > > > I see people do this from time to time. Just out of curiosity, is this > > considered good coding style, or is it considered "lazyness"? I'm not > > sure what to think of it. > > > > If I would have written this, there would have been 3 triggers w/o the > > check on TG_OP. Is there an important drawback to doing so? Is there any > > document on "preferred" coding style in PL/PGSQL? > > > > Yes, I'm a bit of a purist... > > > Given you have to define a function for each trigger, my view is why write > more functions. > > Along with this. As a C programmer, I would do a few more IF tests in a > function, rather than > write another one. I find that triggers like this are one functional block > and all go together. > Then when you update the function, it's all in one place. > > Others may have "better" reasons for why they do it the way they do. But > they are mine. > > Regards > > Russell Smith. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
How do you do a self-join on a subselect ? like SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10 but without performing the subselect twice ..................................?
Make a plpgsql function which will iterate over the rows on which the moving average is to be done (FOR row IN SELECT), of course use the correct order, then use an array as a FIFO, add a row to the moving average and push it, pop the old one and substract it. Roundoff errors will bite your nether regions. I spose the purpose of this is not to fetch the whole thing so that your moving average will not have one result row per source row (or you'd do it in the application), thus you can pre-shrink your dataset by putting some avg() and group by in your source select. > Hi, > I need to calculate a moving average and I would like to do it with SQL, > or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible > in Pg without a bunch of self joins, or is there a funtion available? > Thanks, > Mike
It's not at all clear what you're asking. Do you have a real example, preferably with EXPLAIN output? On Wed, Jan 26, 2005 at 11:12:25PM +0100, PFC wrote: > > How do you do a self-join on a subselect ? > > like > SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10 > > but without performing the subselect twice > ..................................? > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Wed, Jan 26, 2005 at 23:12:25 +0100, PFC <lists@boutiquenumerique.com> wrote: > > How do you do a self-join on a subselect ? > > like > SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10 > > but without performing the subselect twice > ..................................? You have to write the subselect twice if you want to match up different rows. In some cases it might be worth making a copy in a temp table. For simple subselects where there is an index on id, leaving it as is should work fine.
Thanks, I was hoping there was some way to avoid it. > You have to write the subselect twice if you want to match up different > rows. > In some cases it might be worth making a copy in a temp table. For simple > subselects where there is an index on id, leaving it as is should work > fine. >