Re: Materialized views proposal - Mailing list pgsql-hackers

From Sailesh Krishnamurthy
Subject Re: Materialized views proposal
Date
Msg-id bxyisl11nl5.fsf@datafix.cs.berkeley.edu
Whole thread Raw
In response to Re: Materialized views proposal  (Hannu Krosing <hannu@tm.ee>)
List pgsql-hackers
>>>>> "Hannu" == Hannu Krosing <hannu@tm.ee> writes:
   Hannu> Neil Conway kirjutas P, 30.11.2003 kell 02:18:   >> Jonathan Gardner <jgardner@jonathangardner.net> writes: >
3)We   >> would implement some sort of differential view update scheme >   >> based on the paper "Efficiently Updating
Materialized  >> Views"[1].
 
   Hannu> Maybe the TelegraphCQ engine can give some ideas
   Hannu> http://telegraph.cs.berkeley.edu/
   Hannu> from brief reading, it seems that all they do is kind of   Hannu> materialized views ;) - they call it
ContinuousDataflow   Hannu> Processing, i.e. queries that run continuously over   Hannu> incoming data.
 


A fair portion of the community argues that everything that we do is
merely materialized views :-)

While materialized views are certainly related, there are however,
things that are quite different in processing continuous queries over
data streams. The first ideas on stream processing did come from work
on scalable triggers which is intimately related to materialized
views. Work on mviews isn't really concerned with things like windowed
joins and aggregate processing.

My advice is to take a very incremental approach to materialized view
implementation. There are essentially 2 pieces - one the maintenance
of the views and second the routing of queries to automagically use
the materialized views. The former is done by something like triggers
and the latter is through either a query-rewrite mechanism or
something like an optimizer choosing a different access method. 

As for the former, there are again two subdivisions - what changes you
propagate and when you apply 'em. What you propagate are the
deltas. So you have the following choices:

IPIA - Immediate Propagate, Immediate Apply 
IPDA - Immediate Propagate, Deferred Apply
DPDA - Deferred Propagate, Deferred Apply

DPIA - makes no sense ..

The easiest is DPDA .. in this model you essentially recompute the
view on demand. The next is IPIA.. with IPIA, you can choose to either
recompute the entire view or only figure out how to translate a delta
into appropriate changes in the mview. After that there is IPDA
.. this involves creating a delta table that is populated based on the
triggers. Every so often you process a set of deltas and recompute the
view (either in its entirety or in parts>

Another way to slice the problem is to limit the scope of queries that
can be used to define views. 

I suggest that first we don't consider joins .. and only consider
grouped aggregates over a single table. Joins are more hairy with
changes in one table potentially requiring a wholesale recomputation
of the join. 

Note that all this discussion is only about the first part of the
problem .. maintenance of the views. The other part, routing is almost
equally complicated .. there you have to solve the query subsumption
problem without also eliminating choices in your access plan. 

As people have said there are plenty of papers on this in the
literature. While I am no theorist I can certainly help with reading
the papers .. not every bit of a paper is very useful. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




pgsql-hackers by date:

Previous
From: Oliver Elphick
Date:
Subject: initdb should create a warning message [was Re: [ADMIN] Size on Disk]
Next
From: Neil Conway
Date:
Subject: Re: initdb should create a warning message [was Re: