Re: Triggers, Stored Procedures to Aggregate table ? - Mailing list pgsql-novice

From Arvind Sharma
Subject Re: Triggers, Stored Procedures to Aggregate table ?
Date
Msg-id 481023.73929.qm@web110116.mail.gq1.yahoo.com
Whole thread Raw
In response to Re: Triggers, Stored Procedures to Aggregate table ?  (Joe Conway <mail@joeconway.com>)
List pgsql-novice
The aggregated data is not required to be present immediately. May be few minutes past the hour is just fine.

But the data needs to be continuously aggregated into hour/daily/weekly - since the raw data in the minute table needs to be cleaned/deleted past the lookback window (say, past 3 days).  The idea being, that the user will always get info on the aggregated data of the past hours and  days ( e.g. 3 days) which they can use to drill down to much granular data (if that is available and not deleted yet) from the minute table.

I am leaning towards let the DB do the job of aggregation based on the trigger/stored procedures and not the main business logic.  Hopefully this will keep the database round-trip and any other over-head to minimum - data aggregated being very local to the source.

Arvind



From: Joe Conway <mail@joeconway.com>
To: Arvind Sharma <arvind321@yahoo.com>
Cc: pgsql-novice@postgresql.org
Sent: Thu, July 8, 2010 10:08:12 AM
Subject: Re: [NOVICE] Triggers, Stored Procedures to Aggregate table ?

On 07/08/2010 07:27 AM, Arvind Sharma wrote:
> I have few tables which stores raw data on minute basis. I want to
> aggregate this data into another table to store every hour worth of
> data. And from there on - from this hourly Aggregated table, want to
> store into another Aggregate Table for a day's worth of data.
>
> You got the direction I am going with this.. :-)....  Hourly, Daily,
> Weekly aggregated data into their respective tables.
>
> I could write some Java code to run periodically on these tables to
> transform them into Aggregate tables but that would have the overhead
> (Network, Disk I/O).  I am wondering if there is any easy way to be
> able to write something at the Postgres level, where some Trigger will
> call some Stored Procedure on a particular table which will do the
> Aggregate (min, max, avg) and store that into a new table.

No matter what you do there is going to be overhead -- you just have to
decide when is the most appropriate or least intrusive time to incur
that overhead. Few questions come to mind:

1) Do you need immediate access to the most recent data, or can you
  batch up data and live with, for example, always having the last
  completed hour available?

2) Do you need continuous aggregation (e.g. the average for current
  hour so far, the average for current day so far) or do you only want
  aggregation of completed time periods (last hour's average,
  yesterday's average, etc.)?

Over the years I have done something similar to what you describe in at
least fours ways:

1) Aggregate on demand
2) Batch aggregate on a periodic basis -- e.g. run your aggregate query
  with a cron job which truncates and rebuilds a table (i.e. a
  "materialized view")
3) Write a C based trigger that does "continuous aggregation" to a
  materialized table
4) Write a C based bulk loader that aggregates as it bulk loads the raw
  data into a materialized table

HTH,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


pgsql-novice by date:

Previous
From: Joe Conway
Date:
Subject: Re: Triggers, Stored Procedures to Aggregate table ?
Next
From: Andre Majorel
Date:
Subject: Bypassing authentication