Thread: Trigger or Function
Hello, I'm a postgres newbie and am wondering what's the best way to do this. I am gathering some data and will be inserting to a table once daily. The table is quite simple but I want the updates to be as efficient as possible since this db is part of a big data project. Say I have a table with these columns: | Date | Hostname | DayVal | WeekAvg | MonthAvg | When I insert a new row I have the values for Date, Hostname, DayVal. Is it possible to define the table is such a way that the WeekAvg and MonthAvg are automatically updated as follows? WeekAvg = current rows DayVal plus the sum of DayVal for the previous 6 rows. MonthAvg = current row's DayVal plus the sum of DayVal for the previous 29 rows. Should I place the logic in a Trigger or in a Function? Does someone have an example or a link showing how I could set this up? Regards, Alan
On Tue, Jul 12, 2011 at 9:41 AM, alan <alan.miller3@gmail.com> wrote: > Hello, > I'm a postgres newbie and am wondering what's the best way to do this. > > I am gathering some data and will be inserting to a table once daily. > The table is quite simple but I want the updates to be as efficient as > possible since > this db is part of a big data project. > > Say I have a table with these columns: > | Date | Hostname | DayVal | WeekAvg | MonthAvg | > > When I insert a new row I have the values for Date, Hostname, DayVal. > Is it possible to define the table is such a way that the WeekAvg and > MonthAvg > are automatically updated as follows? > WeekAvg = current rows DayVal plus the sum of DayVal for the > previous 6 rows. > MonthAvg = current row's DayVal plus the sum of DayVal for the > previous 29 rows. > > Should I place the logic in a Trigger or in a Function? > Does someone have an example or a link showing how I could set this > up? IMHO that design does not fit the relational model well because you are trying to store multirow aggregate values in individual rows. For example, your values will be wrong if you insert rows in the wrong order (i.e. today's data before yesterday's data). My first approach would be to remove WeekAvg and MonthAvg from the table and create a view which calculates appropriate values. If that proves too inefficient (e.g. because the data set is too huge and too much data is queried for individual queries) we can start optimizing. One approach to optimizing would be to have secondary tables | Week | Hostname | WeekAvg | | Month | Hostname | MonthAvg | and update them with an insert trigger and probably also with an update and delete trigger. If you actually need increasing values (i.e. running totals) you can use windowing functions (analytic SQL in Oracle). View definitions then of course need to change. http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-WINDOW Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
> My first approach would be to remove WeekAvg and MonthAvg from the > table and create a view which calculates appropriate values. Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing features. Here is how I set it up. If anyone sees an issue, please let me know. I'm new to postgres. Basically, my "daily_vals" table contains HOST, DATE, & VALUE columns. What I wanted was a way to automatically populate a 4th column called "rolling_average", which would be the sum of <n> preceding columns. testdb=# select * from daily_vals; rid | date | host | value -----+------------+--------+------------- 1 | 2011-07-01 | hosta | 100.0000 2 | 2011-07-02 | hosta | 200.0000 3 | 2011-07-03 | hosta | 400.0000 4 | 2011-07-04 | hosta | 500.0000 5 | 2011-07-05 | hosta | 100.0000 6 | 2011-07-06 | hosta | 700.0000 7 | 2011-07-07 | hosta | 200.0000 8 | 2011-07-08 | hosta | 100.0000 9 | 2011-07-09 | hosta | 100.0000 10 | 2011-07-10 | hosta | 100.0000 11 | 2011-07-01 | hostb | 5.7143 12 | 2011-07-02 | hostb | 8.5714 13 | 2011-07-03 | hostb | 11.4286 14 | 2011-07-04 | hostb | 8.5714 15 | 2011-07-05 | hostb | 2.8571 16 | 2011-07-06 | hostb | 1.4286 17 | 2011-07-07 | hostb | 1.4286 I created a view called weekly_average using this VIEW statement. CREATE OR REPLACE VIEW weekly_average AS SELECT *, sum(value) OVER (PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as rolling_average FROM daily_vals; The I query the view just like a regular table. the rolling average is calulated from the previuous 6 rows (for each host). testdb=# select * from weekly_average; rid | date | host | value | rolling_average -----+------------+--------+----------+------------------ 1 | 2011-07-01 | hosta | 100.0000 | 100.0000 2 | 2011-07-02 | hosta | 200.0000 | 300.0000 3 | 2011-07-03 | hosta | 400.0000 | 700.0000 4 | 2011-07-04 | hosta | 500.0000 | 1200.0000 5 | 2011-07-05 | hosta | 100.0000 | 1300.0000 6 | 2011-07-06 | hosta | 700.0000 | 2000.0000 7 | 2011-07-07 | hosta | 200.0000 | 1400.0000 8 | 2011-07-08 | hosta | 100.0000 | 1400.0000 9 | 2011-07-09 | hosta | 100.0000 | 1200.0000 10 | 2011-07-10 | hosta | 100.0000 | 600.0000 11 | 2011-07-01 | hostb | 5.7143 | 5.7143 12 | 2011-07-02 | hostb | 8.5714 | 14.2857 13 | 2011-07-03 | hostb | 11.4286 | 25.7143 14 | 2011-07-04 | hostb | 8.5714 | 34.2857 15 | 2011-07-05 | hostb | 2.8571 | 37.1428 16 | 2011-07-06 | hostb | 1.4286 | 38.5714 17 | 2011-07-07 | hostb | 1.4286 | 40.0000 Alan
On 24/07/11 03:58, alan wrote: >> My first approach would be to remove WeekAvg and MonthAvg from the >> table and create a view which calculates appropriate values. > Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing > features. > Here is how I set it up. If anyone sees an issue, please let me know. > I'm new to postgres. > > Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns. > What I wanted was a way to automatically populate a 4th column > called "rolling_average", which would be the sum of<n> preceding > columns. > > testdb=# select * from daily_vals; > rid | date | host | value > -----+------------+--------+------------- > 1 | 2011-07-01 | hosta | 100.0000 > 2 | 2011-07-02 | hosta | 200.0000 > 3 | 2011-07-03 | hosta | 400.0000 > 4 | 2011-07-04 | hosta | 500.0000 > 5 | 2011-07-05 | hosta | 100.0000 > 6 | 2011-07-06 | hosta | 700.0000 > 7 | 2011-07-07 | hosta | 200.0000 > 8 | 2011-07-08 | hosta | 100.0000 > 9 | 2011-07-09 | hosta | 100.0000 > 10 | 2011-07-10 | hosta | 100.0000 > 11 | 2011-07-01 | hostb | 5.7143 > 12 | 2011-07-02 | hostb | 8.5714 > 13 | 2011-07-03 | hostb | 11.4286 > 14 | 2011-07-04 | hostb | 8.5714 > 15 | 2011-07-05 | hostb | 2.8571 > 16 | 2011-07-06 | hostb | 1.4286 > 17 | 2011-07-07 | hostb | 1.4286 > > > I created a view called weekly_average using this VIEW statement. > > CREATE OR REPLACE > VIEW weekly_average > AS SELECT *, sum(value) OVER (PARTITION BY host > ORDER BY rid > ROWS BETWEEN 6 PRECEDING AND CURRENT ROW > ) as rolling_average FROM daily_vals; > > > The I query the view just like a regular table. > the rolling average is calulated from the previuous 6 rows (for each > host). > > testdb=# select * from weekly_average; > rid | date | host | value | rolling_average > -----+------------+--------+----------+------------------ > 1 | 2011-07-01 | hosta | 100.0000 | 100.0000 > 2 | 2011-07-02 | hosta | 200.0000 | 300.0000 > 3 | 2011-07-03 | hosta | 400.0000 | 700.0000 > 4 | 2011-07-04 | hosta | 500.0000 | 1200.0000 > 5 | 2011-07-05 | hosta | 100.0000 | 1300.0000 > 6 | 2011-07-06 | hosta | 700.0000 | 2000.0000 > 7 | 2011-07-07 | hosta | 200.0000 | 1400.0000 > 8 | 2011-07-08 | hosta | 100.0000 | 1400.0000 > 9 | 2011-07-09 | hosta | 100.0000 | 1200.0000 > 10 | 2011-07-10 | hosta | 100.0000 | 600.0000 > 11 | 2011-07-01 | hostb | 5.7143 | 5.7143 > 12 | 2011-07-02 | hostb | 8.5714 | 14.2857 > 13 | 2011-07-03 | hostb | 11.4286 | 25.7143 > 14 | 2011-07-04 | hostb | 8.5714 | 34.2857 > 15 | 2011-07-05 | hostb | 2.8571 | 37.1428 > 16 | 2011-07-06 | hostb | 1.4286 | 38.5714 > 17 | 2011-07-07 | hostb | 1.4286 | 40.0000 > > Alan > > > The above gives just the rolling sum, you need to divide by the number of rows in the sum to get the average (I assume you want the arithmetic mean, as the are many types of average!). CREATE OR REPLACE VIEW weekly_average AS SELECT *, round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER mywindow))), 4) AS rolling_average FROM daily_vals WINDOW mywindow AS ( PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ); Cheers, Gavin
On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > On 24/07/11 03:58, alan wrote: >>> >>> My first approach would be to remove WeekAvg and MonthAvg from the >>> table and create a view which calculates appropriate values. >> >> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing >> features. >> Here is how I set it up. If anyone sees an issue, please let me know. >> I'm new to postgres. >> >> Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns. >> What I wanted was a way to automatically populate a 4th column >> called "rolling_average", which would be the sum of<n> preceding >> columns. There seems to be contradiction in the naming here. Did you mean "avg of<n> preceding columns."? >> I created a view called weekly_average using this VIEW statement. >> >> CREATE OR REPLACE >> VIEW weekly_average >> AS SELECT *, sum(value) OVER (PARTITION BY host >> ORDER BY rid >> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW >> ) as rolling_average FROM daily_vals; > The above gives just the rolling sum, you need to divide by the number of > rows in the sum to get the average (I assume you want the arithmetic mean, > as the are many types of average!). > > CREATE OR REPLACE > VIEW weekly_average > AS SELECT > *, > round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER > mywindow))), 4) AS rolling_average > FROM daily_vals > WINDOW mywindow AS > ( > PARTITION BY host > ORDER BY rid > ROWS BETWEEN 6 PRECEDING AND CURRENT ROW > ); Why not CREATE OR REPLACE VIEW weekly_average AS SELECT *, avg(value) OVER (PARTITION BY host ORDER BY rid ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as rolling_average FROM daily_vals; What did I miss? Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
On 01/08/11 19:18, Robert Klemme wrote: > On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower > <GavinFlower@archidevsys.co.nz> wrote: >> On 24/07/11 03:58, alan wrote: >>>> My first approach would be to remove WeekAvg and MonthAvg from the >>>> table and create a view which calculates appropriate values. >>> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing >>> features. >>> Here is how I set it up. If anyone sees an issue, please let me know. >>> I'm new to postgres. >>> >>> Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns. >>> What I wanted was a way to automatically populate a 4th column >>> called "rolling_average", which would be the sum of<n> preceding >>> columns. > There seems to be contradiction in the naming here. Did you mean "avg > of<n> preceding columns."? > >>> I created a view called weekly_average using this VIEW statement. >>> >>> CREATE OR REPLACE >>> VIEW weekly_average >>> AS SELECT *, sum(value) OVER (PARTITION BY host >>> ORDER BY rid >>> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW >>> ) as rolling_average FROM daily_vals; >> The above gives just the rolling sum, you need to divide by the number of >> rows in the sum to get the average (I assume you want the arithmetic mean, >> as the are many types of average!). >> >> CREATE OR REPLACE >> VIEW weekly_average >> AS SELECT >> *, >> round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER >> mywindow))), 4) AS rolling_average >> FROM daily_vals >> WINDOW mywindow AS >> ( >> PARTITION BY host >> ORDER BY rid >> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW >> ); > Why not > > CREATE OR REPLACE > VIEW weekly_average > AS SELECT *, avg(value) OVER (PARTITION BY host > ORDER BY rid > ROWS BETWEEN 6 PRECEDING AND CURRENT ROW > ) as rolling_average FROM daily_vals; > > What did I miss? > > Kind regards > > robert > <Chuckle> Your fix is much more elegant and efficient, though both approaches work!