Thread: Triggers, Stored Procedures to Aggregate table ?

Triggers, Stored Procedures to Aggregate table ?

From
Arvind Sharma
Date:
Hi,

Very new to the Triggers and Sprocs.

I have few tables which stores raw data on minute basis. I want to aggregate this data into another table to store every hour worth of data. And from there on - from this hourly Aggregated table, want to store into another Aggregate Table for a day's worth of data.

You got the direction I am going with this.. :-)....  Hourly, Daily, Weekly aggregated data into their respective tables.

I could write some Java code to run periodically on these tables to transform them into Aggregate tables but that would have the overhead (Network, Disk I/O).   I am wondering if there is any easy way to be able to write something at the Postgres level, where some Trigger will call some Stored Procedure on a particular table which will do the Aggregate (min, max, avg) and store that into a new table.


Any pointers or suggestions or examples would be highly appreciated !


Thanks!
Arvind

Re: Triggers, Stored Procedures to Aggregate table ?

From
Richard Broersma
Date:
On Thu, Jul 8, 2010 at 7:27 AM, Arvind Sharma <arvind321@yahoo.com> wrote:

> Very new to the Triggers and Sprocs.
>
> I have few tables which stores raw data on minute basis. I want to aggregate
> this data into another table to store every hour worth of data. And from
> there on - from this hourly Aggregated table, want to store into another
> Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-)....  Hourly, Daily, Weekly
> aggregated data into their respective tables.
>
> Any pointers or suggestions or examples would be highly appreciated !

Here is a resource that describes various ways to create materialized views:

http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: Triggers, Stored Procedures to Aggregate table ?

From
Joe Conway
Date:
On 07/08/2010 07:27 AM, Arvind Sharma wrote:
> I have few tables which stores raw data on minute basis. I want to
> aggregate this data into another table to store every hour worth of
> data. And from there on - from this hourly Aggregated table, want to
> store into another Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-)....  Hourly, Daily,
> Weekly aggregated data into their respective tables.
>
> I could write some Java code to run periodically on these tables to
> transform them into Aggregate tables but that would have the overhead
> (Network, Disk I/O).   I am wondering if there is any easy way to be
> able to write something at the Postgres level, where some Trigger will
> call some Stored Procedure on a particular table which will do the
> Aggregate (min, max, avg) and store that into a new table.

No matter what you do there is going to be overhead -- you just have to
decide when is the most appropriate or least intrusive time to incur
that overhead. Few questions come to mind:

1) Do you need immediate access to the most recent data, or can you
   batch up data and live with, for example, always having the last
   completed hour available?

2) Do you need continuous aggregation (e.g. the average for current
   hour so far, the average for current day so far) or do you only want
   aggregation of completed time periods (last hour's average,
   yesterday's average, etc.)?

Over the years I have done something similar to what you describe in at
least fours ways:

1) Aggregate on demand
2) Batch aggregate on a periodic basis -- e.g. run your aggregate query
   with a cron job which truncates and rebuilds a table (i.e. a
   "materialized view")
3) Write a C based trigger that does "continuous aggregation" to a
   materialized table
4) Write a C based bulk loader that aggregates as it bulk loads the raw
   data into a materialized table

HTH,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


Attachment

Re: Triggers, Stored Procedures to Aggregate table ?

From
Arvind Sharma
Date:
The aggregated data is not required to be present immediately. May be few minutes past the hour is just fine.

But the data needs to be continuously aggregated into hour/daily/weekly - since the raw data in the minute table needs to be cleaned/deleted past the lookback window (say, past 3 days).  The idea being, that the user will always get info on the aggregated data of the past hours and  days ( e.g. 3 days) which they can use to drill down to much granular data (if that is available and not deleted yet) from the minute table.

I am leaning towards let the DB do the job of aggregation based on the trigger/stored procedures and not the main business logic.  Hopefully this will keep the database round-trip and any other over-head to minimum - data aggregated being very local to the source.

Arvind



From: Joe Conway <mail@joeconway.com>
To: Arvind Sharma <arvind321@yahoo.com>
Cc: pgsql-novice@postgresql.org
Sent: Thu, July 8, 2010 10:08:12 AM
Subject: Re: [NOVICE] Triggers, Stored Procedures to Aggregate table ?

On 07/08/2010 07:27 AM, Arvind Sharma wrote:
> I have few tables which stores raw data on minute basis. I want to
> aggregate this data into another table to store every hour worth of
> data. And from there on - from this hourly Aggregated table, want to
> store into another Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-)....  Hourly, Daily,
> Weekly aggregated data into their respective tables.
>
> I could write some Java code to run periodically on these tables to
> transform them into Aggregate tables but that would have the overhead
> (Network, Disk I/O).  I am wondering if there is any easy way to be
> able to write something at the Postgres level, where some Trigger will
> call some Stored Procedure on a particular table which will do the
> Aggregate (min, max, avg) and store that into a new table.

No matter what you do there is going to be overhead -- you just have to
decide when is the most appropriate or least intrusive time to incur
that overhead. Few questions come to mind:

1) Do you need immediate access to the most recent data, or can you
  batch up data and live with, for example, always having the last
  completed hour available?

2) Do you need continuous aggregation (e.g. the average for current
  hour so far, the average for current day so far) or do you only want
  aggregation of completed time periods (last hour's average,
  yesterday's average, etc.)?

Over the years I have done something similar to what you describe in at
least fours ways:

1) Aggregate on demand
2) Batch aggregate on a periodic basis -- e.g. run your aggregate query
  with a cron job which truncates and rebuilds a table (i.e. a
  "materialized view")
3) Write a C based trigger that does "continuous aggregation" to a
  materialized table
4) Write a C based bulk loader that aggregates as it bulk loads the raw
  data into a materialized table

HTH,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support