Re: Materialized views proposal - Mailing list pgsql-hackers
From | Sailesh Krishnamurthy |
---|---|
Subject | Re: Materialized views proposal |
Date | |
Msg-id | bxyisl11nl5.fsf@datafix.cs.berkeley.edu Whole thread Raw |
In response to | Re: Materialized views proposal (Hannu Krosing <hannu@tm.ee>) |
List | pgsql-hackers |
>>>>> "Hannu" == Hannu Krosing <hannu@tm.ee> writes: Hannu> Neil Conway kirjutas P, 30.11.2003 kell 02:18: >> Jonathan Gardner <jgardner@jonathangardner.net> writes: > 3)We >> would implement some sort of differential view update scheme > >> based on the paper "Efficiently Updating Materialized >> Views"[1]. Hannu> Maybe the TelegraphCQ engine can give some ideas Hannu> http://telegraph.cs.berkeley.edu/ Hannu> from brief reading, it seems that all they do is kind of Hannu> materialized views ;) - they call it ContinuousDataflow Hannu> Processing, i.e. queries that run continuously over Hannu> incoming data. A fair portion of the community argues that everything that we do is merely materialized views :-) While materialized views are certainly related, there are however, things that are quite different in processing continuous queries over data streams. The first ideas on stream processing did come from work on scalable triggers which is intimately related to materialized views. Work on mviews isn't really concerned with things like windowed joins and aggregate processing. My advice is to take a very incremental approach to materialized view implementation. There are essentially 2 pieces - one the maintenance of the views and second the routing of queries to automagically use the materialized views. The former is done by something like triggers and the latter is through either a query-rewrite mechanism or something like an optimizer choosing a different access method. As for the former, there are again two subdivisions - what changes you propagate and when you apply 'em. What you propagate are the deltas. So you have the following choices: IPIA - Immediate Propagate, Immediate Apply IPDA - Immediate Propagate, Deferred Apply DPDA - Deferred Propagate, Deferred Apply DPIA - makes no sense .. The easiest is DPDA .. in this model you essentially recompute the view on demand. The next is IPIA.. with IPIA, you can choose to either recompute the entire view or only figure out how to translate a delta into appropriate changes in the mview. After that there is IPDA .. this involves creating a delta table that is populated based on the triggers. Every so often you process a set of deltas and recompute the view (either in its entirety or in parts> Another way to slice the problem is to limit the scope of queries that can be used to define views. I suggest that first we don't consider joins .. and only consider grouped aggregates over a single table. Joins are more hairy with changes in one table potentially requiring a wholesale recomputation of the join. Note that all this discussion is only about the first part of the problem .. maintenance of the views. The other part, routing is almost equally complicated .. there you have to solve the query subsumption problem without also eliminating choices in your access plan. As people have said there are plenty of papers on this in the literature. While I am no theorist I can certainly help with reading the papers .. not every bit of a paper is very useful. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
pgsql-hackers by date: