Thread: Data warehouse & OLAP
hi,
I'm working in the implementation of a datawarehouse on Postgres.
For analisys of aggregated data I'd like to use some OLAP tools like for example, Mondrian, OR use the meterialized view (if better).
My questions:
is there any documentation about warehouse (with OLAP or materialized view) on Postgres?
Which is better? OLAP tools (I think) or materialized view?
Any information is welcome!
Thanks
Stefano
On 8/9/06, Stefano B. <stefano.bonnin@comai.to> wrote: > > > hi, > I'm working in the implementation of a datawarehouse on Postgres. > > For analisys of aggregated data I'd like to use some OLAP tools like for > example, Mondrian, OR use the meterialized view (if better). > > My questions: > is there any documentation about warehouse (with OLAP or materialized view) > on Postgres? > Which is better? OLAP tools (I think) or materialized view? > > Any information is welcome! > Thanks You'll obviously have to identify all possible changes to your normalized data that affect the data warehouse consistency. As far as I know, pgsql doesn't directly support materialized views, so you'll write one or more update_warehouse functions and do one of the following: 1.) set up a collection of triggers to keep track of all changes since the last synchronization. Set up a periodic task (probably using pgAgent) which will invoke the update_warehouse functions which will than update the needed records. 2.) set up a collection of triggers directly calling your update_functions The first option is what probably 99% users need because of the implicit nature of the queries run against warehouse data. An hour, day or even week of the latest data very often makes no difference when analyzing an OLAP cube, but the exact tolerance level obviously depends on the exact queries analysts really run, the resources available/needed to refresh the warehouse, the nature of the data etc. The second option is nice in that it keeps the relational data in sync with the warehouse, but this can only be implemented in specific systems where the update load is tolerable, changes limited in scope and update triggers highly focused on the scope of the change. I can imagine circumstances when you'd need such a setup, but most of the time it's just a theoretical possibility. I have implemented the second approach using the very good Mondrian OLAP server and stunning JRubik analysis interface in one of my systems (small database, several dozen MB, less than a milion records, total) at the price of a ~2 sec delay when updating a row - an action that happens about 50-60 times per day. The reason was very non-technical: we needed to be able to say "you can analyze data in real time". Hope this helps. Cheers, t.n.a.