Thread: Calculating a moving average

Calculating a moving average

From

"Vanole, Mike"

Date:

21 January 2005, 04:23:41

Hi,

I need to calculate a moving average and I would like to do it with SQL, or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible in Pg without a bunch of self joins, or is there a funtion available?

Thanks,

Mike

Re: Calculating a moving average

From

"Dann Corbit"

Date:

21 January 2005, 04:53:01

Why not use a cursor?

P.S.

A moving average will look much better if you Hahn the tails.

To do a normal 7 point moving average, you take (x[i]+ x[i+1]+ x[i+2]+ x[i+3]+ x[i+4]+ x[i+5]+ x[i+6])/7 as point xprime[i] and (y[i]+ y[i+1]+ y[i+2]+ y[i+3]+ y[i+4]+ y[i+5]+ y[i+6])/7 as point yprime[i].

But when you start let x[0], y[0] be the first point, and then average the next 2 for the second point… until you get to 7 and then do the same thing in the other end. Otherwise, you get a very strange looking curve.

IMO-YMMV

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Vanole, Mike
Sent: Wednesday, January 19, 2005 1:34 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Calculating a moving average

Hi,

Thanks,

Mike

Re: Calculating a moving average

From

"Jim C. Nasby"

Date:

21 January 2005, 05:09:53

Also, if you don't need an exact moving average, you might consider a
weighted mean. Something like:

mean = mean * 0.9 + new_value * 0.1

Much easier to maintain than a moving average.

On Thu, Jan 20, 2005 at 08:40:24PM -0800, Dann Corbit wrote:
> Why not use a cursor?
>
>
>
> P.S.
>
> A moving average will look much better if you Hahn the tails.
>
>
> To do a normal 7 point moving average, you take (x[i]+ x[i+1]+ x[i+2]+
> x[i+3]+ x[i+4]+ x[i+5]+ x[i+6])/7 as point xprime[i] and (y[i]+ y[i+1]+
> y[i+2]+ y[i+3]+ y[i+4]+ y[i+5]+ y[i+6])/7 as point yprime[i].
>
>
>
> But when you start let x[0], y[0] be the first point, and then average
> the next 2 for the second point... until you get to 7 and then do the
> same thing in the other end.  Otherwise, you get a very strange looking
> curve.
>
>
>
> IMO-YMMV
>
> ________________________________
>
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Vanole, Mike
> Sent: Wednesday, January 19, 2005 1:34 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Calculating a moving average
>
>
>
> Hi,
>
>
>
> I need to calculate a moving average and I would like to do it with SQL,
> or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible
> in Pg without a bunch of self joins, or is there a funtion available?
>
>
>
> Thanks,
>
> Mike
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Calculating a moving average

From

Greg Stark

Date:

21 January 2005, 05:54:01

"Vanole, Mike" <Mike.Vanole@cingular.com> writes:

> I need to calculate a moving average and I would like to do it with SQL,
> or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible
> in Pg without a bunch of self joins, or is there a funtion available?

Unfortunately moving averages fall into a class of functions, called analytic
functions (at least that's what Oracle calls them) that are inherently hard to
model efficiently in SQL. Postgres doesn't have any special support for this
set of functions, so you're stuck doing it the inefficient ways that standard
SQL allows.

I think this is even hard to implement correctly using Postgres's extremely
extensible function support. Even if you implemented it in Perl or Python I
don't think there's any way to allocate a temporary static storage area for a
given call site. So your moving average function would behave strangely if you
called it twice in a given query.

But if you can work within that caveat it should be straightforward to
implement it efficiently in Perl or Python. Alternatively you can write a
plpgsql function to calculate the specific moving average you need that does
the select itself.

--
greg

Re: Calculating a moving average

From

"Jim C. Nasby"

Date:

21 January 2005, 06:13:53

On Fri, Jan 21, 2005 at 12:53:45AM -0500, Greg Stark wrote:
> "Vanole, Mike" <Mike.Vanole@cingular.com> writes:
>
> > I need to calculate a moving average and I would like to do it with SQL,
> > or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible
> > in Pg without a bunch of self joins, or is there a funtion available?
>
> Unfortunately moving averages fall into a class of functions, called analytic
> functions (at least that's what Oracle calls them) that are inherently hard to
> model efficiently in SQL. Postgres doesn't have any special support for this
> set of functions, so you're stuck doing it the inefficient ways that standard
> SQL allows.
>
> I think this is even hard to implement correctly using Postgres's extremely
> extensible function support. Even if you implemented it in Perl or Python I
> don't think there's any way to allocate a temporary static storage area for a
> given call site. So your moving average function would behave strangely if you
> called it twice in a given query.
>
> But if you can work within that caveat it should be straightforward to
> implement it efficiently in Perl or Python. Alternatively you can write a
> plpgsql function to calculate the specific moving average you need that does
> the select itself.

If you're feeling adventurous, you might look at Oracle's documentation
on their analytic functions and see if you can come up with something
generic for PostgreSQL. Even if you only do a moving average function it
would be a good start.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Calculating a moving average

From

Greg Stark

Date:

21 January 2005, 06:31:05

"Jim C. Nasby" <decibel@decibel.org> writes:

> If you're feeling adventurous, you might look at Oracle's documentation
> on their analytic functions and see if you can come up with something
> generic for PostgreSQL.

I think the hard part of doing even a simple implementation is precisely the
point I raised about doing it in Perl or Python. Somehow you have to allocate
a static storage area specific to the call site. It's sort of like an
aggregate function call except of course that you're going to return a datum
for every record.

For a fuller implementation there are a lot more details. If I understand
correctly in Oracle you get to specify an ORDER BY clause and the equivalent
of a GROUP BY clause in the analytic function call. I think each call site can
even have its own order and grouping.

> Even if you only do a moving average function it would be a good start.

Actually my pet one would be a "rank" function. So you could do something like
"return the top 3 scoring players from each team". Currently the suggested way
to do it is by using an aggregate function to gather up the data in an array.

--
greg

Re: Calculating a moving average

From

"Dann Corbit"

Date:

21 January 2005, 06:57:43

If someone wanted to put arbitrary aggregates into PostgreSQL, I would
suggest something akin to the "RED BRICK" API, or better yet, the ATLAS
API:
http://magna.cs.ucla.edu/atlas/

Re: Calculating a moving average

From

Greg Stark

Date:

21 January 2005, 14:27:38

"Dann Corbit" <DCorbit@connx.com> writes:

> If someone wanted to put arbitrary aggregates into PostgreSQL, I would
> suggest something akin to the "RED BRICK" API, or better yet, the ATLAS
> API:

I also found a good reference for the DB2's SQL2003 Standard OLAP functions:

http://publib.boulder.ibm.com/infocenter/rb63help/topic/com.ibm.redbrick.doc6.3/sqlrg/sqlrg36.htm



--
greg

Re: Calculating a moving average

From

mstory@uchicago.edu

Date:

21 January 2005, 19:27:22

Unless I'm grossly misunderstanding the problem i think that a trigger written
in PL/pgsql would work fine.  Something like this:

CREATE TABLE foo (
foo_id SERIAL primary key,
foo TEXT);

CREATE TABLE bar (
foo_id INTEGER references foo,
bar_id SERIAL primary key,
bar DOUBLE PRECISION NOT NULL);

CREATE TABLE bar_avg (
foo_id INTEGER primary key references foo,
bar_avg DOUBLE PRECISION);

CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS '
DECLARE
bar_record RECORD;
x INTEGER;
y DOUBLE PRECISION := 0;
BEGIN
     IF TG_OP = ''INSERT'' THEN
          y := y + NEW.bar;
          x := 1;
          FOR bar_record IN SELECT * FROM bar LOOP
               IF NEW.foo_id = bar_record.foo_id THEN
                   y := y + bar_record.bar;
                   x := x + 1;
               END IF;
          END LOOP;
          y := y/x;
          IF EXISTS(SELECT * FROM bar_avg WHERE foo_id = NEW.foo_id) THEN
                UPDATE bar_avg
                       SET bar_avg.bar_avg = y
                       WHERE foo_id = NEW.foo_id;
          ELSE
                INSERT INTO bar_avg VALUES (NEW.foo_id, y);
          END IF;
          RETURN NEW;
     ELSIF TG_OP = ''DELETE'' THEN
           x := 0;
           FOR bar_record IN SELECT * FROM bar LOOP
                 IF OLD.foo_id = bar_record.foo_id AND OLD.bar_id <>
                 bar_record.bar_id THEN
                      y := y + bar_record.bar;
                      x := x + 1;
                 END IF;
           END LOOP;
           IF x <> 0 THEN
                y := y/x;
           END IF;
           UPDATE bar_avg
                  SET bar_avg.bar_avg = y
                  WHERE foo_id = OLD.foo_id;
           RETURN OLD;
      ELSE
           y := y + NEW.bar;
           x := 1;
           FOR bar_record IN SELECT * FROM bar LOOP
                 IF OLD.bar_id <> bar_record.bar_id THEN
                       y := y + bar_record.bar;
                       x := x + 1;
                 END IF;
           END LOOP;
           y := y/x;
           UPDATE bar_avg
                 SET bar_avg.bar_avg = y
                 WHERE foo_id = OLD.foo_id;
           RETURN NEW;
      END IF;
END;
' LANGUAGE plpgsql;

CREATE TRIGGER get_bar_avg BEFORE INSERT OR DELETE OR UPDATE ON bar FOR EACH ROW
EXECUTE PROCEDURE get_bar_avg();

That should work,

regards,
matt


Quoting "Jim C. Nasby" <decibel@decibel.org>:

> On Fri, Jan 21, 2005 at 12:53:45AM -0500, Greg Stark wrote:
> > "Vanole, Mike" <Mike.Vanole@cingular.com> writes:
> >
> > > I need to calculate a moving average and I would like to do it with
> SQL,
> > > or a Pg function built for this purpose. I'm on Pg 7.4. Is this
> possible
> > > in Pg without a bunch of self joins, or is there a funtion available?
> >
> > Unfortunately moving averages fall into a class of functions, called
> analytic
> > functions (at least that's what Oracle calls them) that are inherently hard
> to
> > model efficiently in SQL. Postgres doesn't have any special support for
> this
> > set of functions, so you're stuck doing it the inefficient ways that
> standard
> > SQL allows.
> >
> > I think this is even hard to implement correctly using Postgres's
> extremely
> > extensible function support. Even if you implemented it in Perl or Python
> I
> > don't think there's any way to allocate a temporary static storage area for
> a
> > given call site. So your moving average function would behave strangely if
> you
> > called it twice in a given query.
> >
> > But if you can work within that caveat it should be straightforward to
> > implement it efficiently in Perl or Python. Alternatively you can write a
> > plpgsql function to calculate the specific moving average you need that
> does
> > the select itself.
>
> If you're feeling adventurous, you might look at Oracle's documentation
> on their analytic functions and see if you can come up with something
> generic for PostgreSQL. Even if you only do a moving average function it
> would be a good start.
> --
> Jim C. Nasby, Database Consultant               decibel@decibel.org
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: Calculating a moving average (Coding style)

From

Alban Hertroys

Date:

24 January 2005, 09:32:31

mstory@uchicago.edu wrote:
> CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS '
> DECLARE
> bar_record RECORD;
> x INTEGER;
> y DOUBLE PRECISION := 0;
> BEGIN
>      IF TG_OP = ''INSERT'' THEN
>           y := y + NEW.bar;
...
>           RETURN NEW;
>      ELSIF TG_OP = ''DELETE'' THEN
>            x := 0;
...
>            RETURN OLD;
>       ELSE
>            y := y + NEW.bar;
...
>            RETURN NEW;
>       END IF;
> END;
> ' LANGUAGE plpgsql;

I see people do this from time to time. Just out of curiosity, is this
considered good coding style, or is it considered "lazyness"? I'm not
sure what to think of it.

If I would have written this, there would have been 3 triggers w/o the
check on TG_OP. Is there an important drawback to doing so? Is there any
document on "preferred" coding style in PL/PGSQL?

Yes, I'm a bit of a purist...

--
Regards,

Alban Hertroys
MAG Productions

T: +31(0)53 4346874
F: +31(0)53 4346876
E: alban@magproductions.nl
W: http://www.magproductions.nl

Re: Calculating a moving average (Coding style)

From

Russell Smith

Date:

24 January 2005, 10:07:23

On Mon, 24 Jan 2005 08:32 pm, Alban Hertroys wrote:
> mstory@uchicago.edu wrote:
> > CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS '
> > DECLARE
> > bar_record RECORD;
> > x INTEGER;
> > y DOUBLE PRECISION := 0;
> > BEGIN
> >      IF TG_OP = ''INSERT'' THEN
> >           y := y + NEW.bar;
> ...
> >           RETURN NEW;
> >      ELSIF TG_OP = ''DELETE'' THEN
> >            x := 0;
> ...
> >            RETURN OLD;
> >       ELSE
> >            y := y + NEW.bar;
> ...
> >            RETURN NEW;
> >       END IF;
> > END;
> > ' LANGUAGE plpgsql;
>
> I see people do this from time to time. Just out of curiosity, is this
> considered good coding style, or is it considered "lazyness"? I'm not
> sure what to think of it.
>
> If I would have written this, there would have been 3 triggers w/o the
> check on TG_OP. Is there an important drawback to doing so? Is there any
> document on "preferred" coding style in PL/PGSQL?
>
> Yes, I'm a bit of a purist...
>
Given you have to define a function for each trigger, my view is why write more functions.

Along with this.  As a C programmer, I would do a few more IF tests in a function, rather than
write another one.  I find that triggers like this are one functional block and all go together.
Then when you update the function, it's all in one place.

Others may have "better" reasons for why they do it the way they do.  But they are mine.

Regards

Russell Smith.

Re: Calculating a moving average (Coding style)

From

mstory@uchicago.edu

Date:

25 January 2005, 00:17:52

I personally use 3 seperate triggers on most occasions, depending on how
different the action for each seperate action is, it's just easier for me and my
people to logically distinguish the functions that way, but the example in the
7.4 documentation for triggers is given using the form that i wrote this
function in, and is useful when the trigger procedures for all 3 actions do
roughly the same thing, it also makes editing the code a little easier in this
situation.

regards,
matt

Quoting Russell Smith <mr-russ@pws.com.au>:

> On Mon, 24 Jan 2005 08:32 pm, Alban Hertroys wrote:
> > mstory@uchicago.edu wrote:
> > > CREATE OR REPLACE FUNCTION get_bar_avg() RETURNS TRIGGER AS '
> > > DECLARE
> > > bar_record RECORD;
> > > x INTEGER;
> > > y DOUBLE PRECISION := 0;
> > > BEGIN
> > >      IF TG_OP = ''INSERT'' THEN
> > >           y := y + NEW.bar;
> > ...
> > >           RETURN NEW;
> > >      ELSIF TG_OP = ''DELETE'' THEN
> > >            x := 0;
> > ...
> > >            RETURN OLD;
> > >       ELSE
> > >            y := y + NEW.bar;
> > ...
> > >            RETURN NEW;
> > >       END IF;
> > > END;
> > > ' LANGUAGE plpgsql;
> >
> > I see people do this from time to time. Just out of curiosity, is this
> > considered good coding style, or is it considered "lazyness"? I'm not
> > sure what to think of it.
> >
> > If I would have written this, there would have been 3 triggers w/o the
> > check on TG_OP. Is there an important drawback to doing so? Is there any
> > document on "preferred" coding style in PL/PGSQL?
> >
> > Yes, I'm a bit of a purist...
> >
> Given you have to define a function for each trigger, my view is why write
> more functions.
>
> Along with this.  As a C programmer, I would do a few more IF tests in a
> function, rather than
> write another one.  I find that triggers like this are one functional block
> and all go together.
> Then when you update the function, it's all in one place.
>
> Others may have "better" reasons for why they do it the way they do.  But
> they are mine.
>
> Regards
>
> Russell Smith.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

self-join on subselect

From

PFC

Date:

26 January 2005, 22:10:49

    How do you do a self-join on a subselect ?

    like
    SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10

    but without performing the subselect twice
..................................?

Re: Calculating a moving average

From

PFC

Date:

26 January 2005, 22:18:40

    Make a plpgsql function which will iterate over the rows on which the
moving average is to be done (FOR row IN SELECT), of course use the
correct order, then use an array as a FIFO, add a row to the moving
average and push it, pop the old one and substract it.
    Roundoff errors will bite your nether regions.

    I spose the purpose of this is not to fetch the whole thing so that your
moving average will not have one result row per source row (or you'd do it
in the application), thus you can pre-shrink your dataset by putting some
avg() and group by in your source select.



> Hi,
> I need to calculate a moving average and I would like to do it with SQL,
> or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible
> in Pg without a bunch of self joins, or is there a funtion available?
> Thanks,
> Mike

Re: self-join on subselect

From

"Jim C. Nasby"

Date:

27 January 2005, 01:25:15

It's not at all clear what you're asking. Do you have a real example,
preferably with EXPLAIN output?

On Wed, Jan 26, 2005 at 11:12:25PM +0100, PFC wrote:
>
>     How do you do a self-join on a subselect ?
>
>     like
>     SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10
>
>     but without performing the subselect twice
> ..................................?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: self-join on subselect

From

Bruno Wolff III

Date:

27 January 2005, 05:31:28

On Wed, Jan 26, 2005 at 23:12:25 +0100,
  PFC <lists@boutiquenumerique.com> wrote:
>
>     How do you do a self-join on a subselect ?
>
>     like
>     SELECT a.x+b.x FROM (subselect) a, (subselect) b WHERE a.id = b.id+10
>
>     but without performing the subselect twice
> ..................................?

You have to write the subselect twice if you want to match up different rows.
In some cases it might be worth making a copy in a temp table. For simple
subselects where there is an index on id, leaving it as is should work fine.

Re: self-join on subselect

From

PFC

Date:

27 January 2005, 09:47:32

    Thanks, I was hoping there was some way to avoid it.

> You have to write the subselect twice if you want to match up different
> rows.
> In some cases it might be worth making a copy in a temp table. For simple
> subselects where there is an index on id, leaving it as is should work
> fine.
>