Thread: Trigger or Function

Trigger or Function

From
alan
Date:
Hello,
I'm a postgres newbie and am wondering what's the best way to do this.

I am gathering some data and will be inserting to a table once daily.
The table is quite simple but I want the updates to be as efficient as
possible since
this db is part of a big data project.

Say I have a table with these columns:
| Date | Hostname | DayVal | WeekAvg | MonthAvg |

When I insert a new row I have the values for Date, Hostname, DayVal.
Is it possible to define the table is such a way that the WeekAvg and
MonthAvg
are automatically updated as follows?
  WeekAvg = current rows DayVal plus the sum of DayVal for the
previous 6 rows.
 MonthAvg = current row's DayVal plus the sum of DayVal for the
previous 29 rows.

Should I place the logic in a Trigger or in a Function?
Does someone have an example or a link showing how I could set this
up?

Regards,
Alan

Re: Trigger or Function

From
Robert Klemme
Date:
On Tue, Jul 12, 2011 at 9:41 AM, alan <alan.miller3@gmail.com> wrote:
> Hello,
> I'm a postgres newbie and am wondering what's the best way to do this.
>
> I am gathering some data and will be inserting to a table once daily.
> The table is quite simple but I want the updates to be as efficient as
> possible since
> this db is part of a big data project.
>
> Say I have a table with these columns:
> | Date | Hostname | DayVal | WeekAvg | MonthAvg |
>
> When I insert a new row I have the values for Date, Hostname, DayVal.
> Is it possible to define the table is such a way that the WeekAvg and
> MonthAvg
> are automatically updated as follows?
>  WeekAvg = current rows DayVal plus the sum of DayVal for the
> previous 6 rows.
>  MonthAvg = current row's DayVal plus the sum of DayVal for the
> previous 29 rows.
>
> Should I place the logic in a Trigger or in a Function?
> Does someone have an example or a link showing how I could set this
> up?

IMHO that design does not fit the relational model well because you
are trying to store multirow aggregate values in individual rows.  For
example, your values will be wrong if you insert rows in the wrong
order (i.e. today's data before yesterday's data).

My first approach would be to remove WeekAvg and MonthAvg from the
table and create a view which calculates appropriate values.

If that proves too inefficient (e.g. because the data set is too huge
and too much data is queried for individual queries) we can start
optimizing.  One approach to optimizing would be to have secondary
tables

| Week | Hostname | WeekAvg |
| Month | Hostname | MonthAvg |

and update them with an insert trigger and probably also with an
update and delete trigger.

If you actually need increasing values (i.e. running totals) you can
use windowing functions (analytic SQL in Oracle).  View definitions
then of course need to change.
http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-WINDOW

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: Trigger or Function

From
alan
Date:
> My first approach would be to remove WeekAvg and MonthAvg from the
> table and create a view which calculates appropriate values.

Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
features.
Here is how I set it up. If anyone sees an issue, please let me know.
I'm new to postgres.

Basically, my "daily_vals" table contains HOST, DATE, & VALUE columns.
What I wanted was a way to automatically populate a 4th column
called "rolling_average", which would be the sum of <n> preceding
columns.

testdb=# select * from daily_vals;
 rid |    date    |  host  |  value
-----+------------+--------+-------------
  1  | 2011-07-01 |  hosta | 100.0000
  2  | 2011-07-02 |  hosta | 200.0000
  3  | 2011-07-03 |  hosta | 400.0000
  4  | 2011-07-04 |  hosta | 500.0000
  5  | 2011-07-05 |  hosta | 100.0000
  6  | 2011-07-06 |  hosta | 700.0000
  7  | 2011-07-07 |  hosta | 200.0000
  8  | 2011-07-08 |  hosta | 100.0000
  9  | 2011-07-09 |  hosta | 100.0000
 10  | 2011-07-10 |  hosta | 100.0000
 11  | 2011-07-01 |  hostb |   5.7143
 12  | 2011-07-02 |  hostb |   8.5714
 13  | 2011-07-03 |  hostb |  11.4286
 14  | 2011-07-04 |  hostb |   8.5714
 15  | 2011-07-05 |  hostb |   2.8571
 16  | 2011-07-06 |  hostb |   1.4286
 17  | 2011-07-07 |  hostb |   1.4286


I created a view called weekly_average using this VIEW statement.

CREATE OR REPLACE
  VIEW weekly_average
    AS SELECT *, sum(value) OVER (PARTITION BY host
            ORDER BY rid
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
          ) as rolling_average FROM daily_vals;


The I query the view just like a regular table.
the rolling average is calulated from the previuous 6 rows (for each
host).

testdb=# select * from weekly_average;
 rid |    date    |  host  |  value   | rolling_average
-----+------------+--------+----------+------------------
  1  | 2011-07-01 |  hosta | 100.0000 |   100.0000
  2  | 2011-07-02 |  hosta | 200.0000 |   300.0000
  3  | 2011-07-03 |  hosta | 400.0000 |   700.0000
  4  | 2011-07-04 |  hosta | 500.0000 |  1200.0000
  5  | 2011-07-05 |  hosta | 100.0000 |  1300.0000
  6  | 2011-07-06 |  hosta | 700.0000 |  2000.0000
  7  | 2011-07-07 |  hosta | 200.0000 |  1400.0000
  8  | 2011-07-08 |  hosta | 100.0000 |  1400.0000
  9  | 2011-07-09 |  hosta | 100.0000 |  1200.0000
 10  | 2011-07-10 |  hosta | 100.0000 |   600.0000
 11  | 2011-07-01 |  hostb |   5.7143 |     5.7143
 12  | 2011-07-02 |  hostb |   8.5714 |    14.2857
 13  | 2011-07-03 |  hostb |  11.4286 |    25.7143
 14  | 2011-07-04 |  hostb |   8.5714 |    34.2857
 15  | 2011-07-05 |  hostb |   2.8571 |    37.1428
 16  | 2011-07-06 |  hostb |   1.4286 |    38.5714
 17  | 2011-07-07 |  hostb |   1.4286 |    40.0000

Alan



Re: Trigger or Function

From
Gavin Flower
Date:
On 24/07/11 03:58, alan wrote:
>> My first approach would be to remove WeekAvg and MonthAvg from the
>> table and create a view which calculates appropriate values.
> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
> features.
> Here is how I set it up. If anyone sees an issue, please let me know.
> I'm new to postgres.
>
> Basically, my "daily_vals" table contains HOST, DATE,&  VALUE columns.
> What I wanted was a way to automatically populate a 4th column
> called "rolling_average", which would be the sum of<n>  preceding
> columns.
>
> testdb=# select * from daily_vals;
>   rid |    date    |  host  |  value
> -----+------------+--------+-------------
>    1  | 2011-07-01 |  hosta | 100.0000
>    2  | 2011-07-02 |  hosta | 200.0000
>    3  | 2011-07-03 |  hosta | 400.0000
>    4  | 2011-07-04 |  hosta | 500.0000
>    5  | 2011-07-05 |  hosta | 100.0000
>    6  | 2011-07-06 |  hosta | 700.0000
>    7  | 2011-07-07 |  hosta | 200.0000
>    8  | 2011-07-08 |  hosta | 100.0000
>    9  | 2011-07-09 |  hosta | 100.0000
>   10  | 2011-07-10 |  hosta | 100.0000
>   11  | 2011-07-01 |  hostb |   5.7143
>   12  | 2011-07-02 |  hostb |   8.5714
>   13  | 2011-07-03 |  hostb |  11.4286
>   14  | 2011-07-04 |  hostb |   8.5714
>   15  | 2011-07-05 |  hostb |   2.8571
>   16  | 2011-07-06 |  hostb |   1.4286
>   17  | 2011-07-07 |  hostb |   1.4286
>
>
> I created a view called weekly_average using this VIEW statement.
>
> CREATE OR REPLACE
>    VIEW weekly_average
>      AS SELECT *, sum(value) OVER (PARTITION BY host
>              ORDER BY rid
>              ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>            ) as rolling_average FROM daily_vals;
>
>
> The I query the view just like a regular table.
> the rolling average is calulated from the previuous 6 rows (for each
> host).
>
> testdb=# select * from weekly_average;
>   rid |    date    |  host  |  value   | rolling_average
> -----+------------+--------+----------+------------------
>    1  | 2011-07-01 |  hosta | 100.0000 |   100.0000
>    2  | 2011-07-02 |  hosta | 200.0000 |   300.0000
>    3  | 2011-07-03 |  hosta | 400.0000 |   700.0000
>    4  | 2011-07-04 |  hosta | 500.0000 |  1200.0000
>    5  | 2011-07-05 |  hosta | 100.0000 |  1300.0000
>    6  | 2011-07-06 |  hosta | 700.0000 |  2000.0000
>    7  | 2011-07-07 |  hosta | 200.0000 |  1400.0000
>    8  | 2011-07-08 |  hosta | 100.0000 |  1400.0000
>    9  | 2011-07-09 |  hosta | 100.0000 |  1200.0000
>   10  | 2011-07-10 |  hosta | 100.0000 |   600.0000
>   11  | 2011-07-01 |  hostb |   5.7143 |     5.7143
>   12  | 2011-07-02 |  hostb |   8.5714 |    14.2857
>   13  | 2011-07-03 |  hostb |  11.4286 |    25.7143
>   14  | 2011-07-04 |  hostb |   8.5714 |    34.2857
>   15  | 2011-07-05 |  hostb |   2.8571 |    37.1428
>   16  | 2011-07-06 |  hostb |   1.4286 |    38.5714
>   17  | 2011-07-07 |  hostb |   1.4286 |    40.0000
>
> Alan
>
>
>
The above gives just the rolling sum, you need to divide by the number
of rows in the sum to get the average (I assume you want the arithmetic
mean, as the are many types of average!).

CREATE OR REPLACE
   VIEW weekly_average
     AS SELECT
         *,
         round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
mywindow))), 4) AS rolling_average
     FROM daily_vals
     WINDOW mywindow AS
     (
         PARTITION BY host
         ORDER BY rid
         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
     );

Cheers,
Gavin

Re: Trigger or Function

From
Robert Klemme
Date:
On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 24/07/11 03:58, alan wrote:
>>>
>>> My first approach would be to remove WeekAvg and MonthAvg from the
>>> table and create a view which calculates appropriate values.
>>
>> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
>> features.
>> Here is how I set it up. If anyone sees an issue, please let me know.
>> I'm new to postgres.
>>
>> Basically, my "daily_vals" table contains HOST, DATE,&  VALUE columns.
>> What I wanted was a way to automatically populate a 4th column
>> called "rolling_average", which would be the sum of<n>  preceding
>> columns.

There seems to be contradiction in the naming here.  Did you mean "avg
of<n>  preceding columns."?

>> I created a view called weekly_average using this VIEW statement.
>>
>> CREATE OR REPLACE
>>   VIEW weekly_average
>>     AS SELECT *, sum(value) OVER (PARTITION BY host
>>             ORDER BY rid
>>             ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>>           ) as rolling_average FROM daily_vals;

> The above gives just the rolling sum, you need to divide by the number of
> rows in the sum to get the average (I assume you want the arithmetic mean,
> as the are many types of average!).
>
> CREATE OR REPLACE
>  VIEW weekly_average
>    AS SELECT
>        *,
>        round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
> mywindow))), 4) AS rolling_average
>    FROM daily_vals
>    WINDOW mywindow AS
>    (
>        PARTITION BY host
>        ORDER BY rid
>        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>    );

Why not

CREATE OR REPLACE
  VIEW weekly_average
    AS SELECT *, avg(value) OVER (PARTITION BY host
            ORDER BY rid
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
          ) as rolling_average FROM daily_vals;

What did I miss?

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: Trigger or Function

From
Gavin Flower
Date:
On 01/08/11 19:18, Robert Klemme wrote:
> On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower
> <GavinFlower@archidevsys.co.nz>  wrote:
>> On 24/07/11 03:58, alan wrote:
>>>> My first approach would be to remove WeekAvg and MonthAvg from the
>>>> table and create a view which calculates appropriate values.
>>> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
>>> features.
>>> Here is how I set it up. If anyone sees an issue, please let me know.
>>> I'm new to postgres.
>>>
>>> Basically, my "daily_vals" table contains HOST, DATE,&    VALUE columns.
>>> What I wanted was a way to automatically populate a 4th column
>>> called "rolling_average", which would be the sum of<n>    preceding
>>> columns.
> There seems to be contradiction in the naming here.  Did you mean "avg
> of<n>   preceding columns."?
>
>>> I created a view called weekly_average using this VIEW statement.
>>>
>>> CREATE OR REPLACE
>>>    VIEW weekly_average
>>>      AS SELECT *, sum(value) OVER (PARTITION BY host
>>>              ORDER BY rid
>>>              ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>>>            ) as rolling_average FROM daily_vals;
>> The above gives just the rolling sum, you need to divide by the number of
>> rows in the sum to get the average (I assume you want the arithmetic mean,
>> as the are many types of average!).
>>
>> CREATE OR REPLACE
>>   VIEW weekly_average
>>     AS SELECT
>>         *,
>>         round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
>> mywindow))), 4) AS rolling_average
>>     FROM daily_vals
>>     WINDOW mywindow AS
>>     (
>>         PARTITION BY host
>>         ORDER BY rid
>>         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>>     );
> Why not
>
> CREATE OR REPLACE
>    VIEW weekly_average
>      AS SELECT *, avg(value) OVER (PARTITION BY host
>              ORDER BY rid
>              ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>            ) as rolling_average FROM daily_vals;
>
> What did I miss?
>
> Kind regards
>
> robert
>
<Chuckle> Your fix is much more elegant and efficient, though both
approaches work!