Thread: Triggers, Stored Procedures to Aggregate table ?
Hi,
Very new to the Triggers and Sprocs.
I have few tables which stores raw data on minute basis. I want to aggregate this data into another table to store every hour worth of data. And from there on - from this hourly Aggregated table, want to store into another Aggregate Table for a day's worth of data.
You got the direction I am going with this.. :-).... Hourly, Daily, Weekly aggregated data into their respective tables.
I could write some Java code to run periodically on these tables to transform them into Aggregate tables but that would have the overhead (Network, Disk I/O). I am wondering if there is any easy way to be able to write something at the Postgres level, where some Trigger will call some Stored Procedure on a particular table which will do the Aggregate (min, max, avg) and store that into a new table.
Any pointers or suggestions or examples would be highly appreciated !
Thanks!
Arvind
Very new to the Triggers and Sprocs.
I have few tables which stores raw data on minute basis. I want to aggregate this data into another table to store every hour worth of data. And from there on - from this hourly Aggregated table, want to store into another Aggregate Table for a day's worth of data.
You got the direction I am going with this.. :-).... Hourly, Daily, Weekly aggregated data into their respective tables.
I could write some Java code to run periodically on these tables to transform them into Aggregate tables but that would have the overhead (Network, Disk I/O). I am wondering if there is any easy way to be able to write something at the Postgres level, where some Trigger will call some Stored Procedure on a particular table which will do the Aggregate (min, max, avg) and store that into a new table.
Any pointers or suggestions or examples would be highly appreciated !
Thanks!
Arvind
On Thu, Jul 8, 2010 at 7:27 AM, Arvind Sharma <arvind321@yahoo.com> wrote: > Very new to the Triggers and Sprocs. > > I have few tables which stores raw data on minute basis. I want to aggregate > this data into another table to store every hour worth of data. And from > there on - from this hourly Aggregated table, want to store into another > Aggregate Table for a day's worth of data. > > You got the direction I am going with this.. :-).... Hourly, Daily, Weekly > aggregated data into their respective tables. > > Any pointers or suggestions or examples would be highly appreciated ! Here is a resource that describes various ways to create materialized views: http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
On 07/08/2010 07:27 AM, Arvind Sharma wrote: > I have few tables which stores raw data on minute basis. I want to > aggregate this data into another table to store every hour worth of > data. And from there on - from this hourly Aggregated table, want to > store into another Aggregate Table for a day's worth of data. > > You got the direction I am going with this.. :-).... Hourly, Daily, > Weekly aggregated data into their respective tables. > > I could write some Java code to run periodically on these tables to > transform them into Aggregate tables but that would have the overhead > (Network, Disk I/O). I am wondering if there is any easy way to be > able to write something at the Postgres level, where some Trigger will > call some Stored Procedure on a particular table which will do the > Aggregate (min, max, avg) and store that into a new table. No matter what you do there is going to be overhead -- you just have to decide when is the most appropriate or least intrusive time to incur that overhead. Few questions come to mind: 1) Do you need immediate access to the most recent data, or can you batch up data and live with, for example, always having the last completed hour available? 2) Do you need continuous aggregation (e.g. the average for current hour so far, the average for current day so far) or do you only want aggregation of completed time periods (last hour's average, yesterday's average, etc.)? Over the years I have done something similar to what you describe in at least fours ways: 1) Aggregate on demand 2) Batch aggregate on a periodic basis -- e.g. run your aggregate query with a cron job which truncates and rebuilds a table (i.e. a "materialized view") 3) Write a C based trigger that does "continuous aggregation" to a materialized table 4) Write a C based bulk loader that aggregates as it bulk loads the raw data into a materialized table HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & Support
Attachment
The aggregated data is not required to be present immediately. May be few minutes past the hour is just fine.
But the data needs to be continuously aggregated into hour/daily/weekly - since the raw data in the minute table needs to be cleaned/deleted past the lookback window (say, past 3 days). The idea being, that the user will always get info on the aggregated data of the past hours and days ( e.g. 3 days) which they can use to drill down to much granular data (if that is available and not deleted yet) from the minute table.
I am leaning towards let the DB do the job of aggregation based on the trigger/stored procedures and not the main business logic. Hopefully this will keep the database round-trip and any other over-head to minimum - data aggregated being very local to the source.
Arvind
From: Joe Conway <mail@joeconway.com>
To: Arvind Sharma <arvind321@yahoo.com>
Cc: pgsql-novice@postgresql.org
Sent: Thu, July 8, 2010 10:08:12 AM
Subject: Re: [NOVICE] Triggers, Stored Procedures to Aggregate table ?
On 07/08/2010 07:27 AM, Arvind Sharma wrote:
> I have few tables which stores raw data on minute basis. I want to
> aggregate this data into another table to store every hour worth of
> data. And from there on - from this hourly Aggregated table, want to
> store into another Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-).... Hourly, Daily,
> Weekly aggregated data into their respective tables.
>
> I could write some Java code to run periodically on these tables to
> transform them into Aggregate tables but that would have the overhead
> (Network, Disk I/O). I am wondering if there is any easy way to be
> able to write something at the Postgres level, where some Trigger will
> call some Stored Procedure on a particular table which will do the
> Aggregate (min, max, avg) and store that into a new table.
No matter what you do there is going to be overhead -- you just have to
decide when is the most appropriate or least intrusive time to incur
that overhead. Few questions come to mind:
1) Do you need immediate access to the most recent data, or can you
batch up data and live with, for example, always having the last
completed hour available?
2) Do you need continuous aggregation (e.g. the average for current
hour so far, the average for current day so far) or do you only want
aggregation of completed time periods (last hour's average,
yesterday's average, etc.)?
Over the years I have done something similar to what you describe in at
least fours ways:
1) Aggregate on demand
2) Batch aggregate on a periodic basis -- e.g. run your aggregate query
with a cron job which truncates and rebuilds a table (i.e. a
"materialized view")
3) Write a C based trigger that does "continuous aggregation" to a
materialized table
4) Write a C based bulk loader that aggregates as it bulk loads the raw
data into a materialized table
HTH,
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support
But the data needs to be continuously aggregated into hour/daily/weekly - since the raw data in the minute table needs to be cleaned/deleted past the lookback window (say, past 3 days). The idea being, that the user will always get info on the aggregated data of the past hours and days ( e.g. 3 days) which they can use to drill down to much granular data (if that is available and not deleted yet) from the minute table.
I am leaning towards let the DB do the job of aggregation based on the trigger/stored procedures and not the main business logic. Hopefully this will keep the database round-trip and any other over-head to minimum - data aggregated being very local to the source.
Arvind
From: Joe Conway <mail@joeconway.com>
To: Arvind Sharma <arvind321@yahoo.com>
Cc: pgsql-novice@postgresql.org
Sent: Thu, July 8, 2010 10:08:12 AM
Subject: Re: [NOVICE] Triggers, Stored Procedures to Aggregate table ?
On 07/08/2010 07:27 AM, Arvind Sharma wrote:
> I have few tables which stores raw data on minute basis. I want to
> aggregate this data into another table to store every hour worth of
> data. And from there on - from this hourly Aggregated table, want to
> store into another Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-).... Hourly, Daily,
> Weekly aggregated data into their respective tables.
>
> I could write some Java code to run periodically on these tables to
> transform them into Aggregate tables but that would have the overhead
> (Network, Disk I/O). I am wondering if there is any easy way to be
> able to write something at the Postgres level, where some Trigger will
> call some Stored Procedure on a particular table which will do the
> Aggregate (min, max, avg) and store that into a new table.
No matter what you do there is going to be overhead -- you just have to
decide when is the most appropriate or least intrusive time to incur
that overhead. Few questions come to mind:
1) Do you need immediate access to the most recent data, or can you
batch up data and live with, for example, always having the last
completed hour available?
2) Do you need continuous aggregation (e.g. the average for current
hour so far, the average for current day so far) or do you only want
aggregation of completed time periods (last hour's average,
yesterday's average, etc.)?
Over the years I have done something similar to what you describe in at
least fours ways:
1) Aggregate on demand
2) Batch aggregate on a periodic basis -- e.g. run your aggregate query
with a cron job which truncates and rebuilds a table (i.e. a
"materialized view")
3) Write a C based trigger that does "continuous aggregation" to a
materialized table
4) Write a C based bulk loader that aggregates as it bulk loads the raw
data into a materialized table
HTH,
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support