Re: Materialized views proposal - Mailing list pgsql-hackers

From Jonathan Gardner
Subject Re: Materialized views proposal
Date
Msg-id 200311291300.10336.jgardner@jonathangardner.net
Whole thread Raw
In response to Re: Materialized views proposal  (Jonathan Gardner <jonagard@amazon.com>)
Responses Re: Materialized views proposal
List pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 26 November 2003 10:58 am, Jonathan Gardner wrote:
> On Wednesday 26 November 2003 09:19, Hannu Krosing wrote:
> > What is needed is good algorithms. Writing C code is secondary to that.
> >
> > Similar problem has kept us from implementing updatable views for quite
> > some time.
>
> You are definitely correct.

Preliminary research has shown that:
1) We would put triggers on each table that contributes to a view.

2) We would only be interested in those inserts and deletes (counting an
update as a delete and then an insert) that satisfy the "WHERE" clause of
the view.

3) We would implement some sort of differential view update scheme based on
the paper "Efficiently Updating Materialized Views"[1]. They classify all
select queries into one of four categories and describe how to go about
producing a differential update. I couldn't understand this part at all.

- ---

I am already too deep in database theory. While I enjoy reading a good
paper, I just don't have the background in DB science like I do in Physics.
For instance, my set theory is very weak (I can't tell if U is a union or
an intersection, for instanct).

At this point, I have a choice. Drop a couple hundred dollars on database
theory textbooks and spend the next three months learning it, or hoping one
of you database theorists out there take pity on me and pick this up or
coach me through it. Any takers on the second one?

- ---

I have a proposal. Let me hear what you think. There will be two kinds of
materialized view in PostgreSQL:

The first is the "snapshot" approach. Provide a materialized view, but don't
provide automatic updating. The user can call a refresh statement to
repopulate the entire view from time to time as appropriate. The benefit of
this is that you can use queries that have functions that are immutable or
stable. (You could probably use volatile as well but I wouldn't recommend
it). This is so easy to implement, it isn't even funny.

Optimizing this would involve collecting all the inserted / updated /deleted
rows since the last snapshot. We can have a logging table that accumulates
all the changes since the last refresh for this purpose. (This kind of
table may exist anyway for replication or other purposes.)

Finally, we could examine the changed rows, ignore the irrelevant ones,
ignore the redundant ones (IE, rows that have been inserted and then
deleted during that time), and decide whether doing a complete refresh
would be quicker than doing several differential updates to the
materialized view.


The second is the automatically updated materialized view. Each insert,
update, or delete executed against the table that are queried for the view
will trigger a function call using the algorithms proposed in [1]. Every
function in the query must be stable.


We could provide a mechanism to alter the materialized view between snapshot
to auto-updated. It would be as simple as refreshing the snapshot and then
enabling the triggers, or as simple as disabling the triggers.


I already have written a set of scripts in perl that provides the first
proposal without the optimization idea. I will put them up at GBorg.


Footnotes:
[1] ftp://ftp.research.microsoft.com/users/palarson/sigmod86.ps

- --
Jonathan Gardner
jgardner@jonathangardner.net
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/yQjZWgwF3QvpWNwRAnqkAKCasoODeV2+KcP52DMXiEyq7pfhmACcCFBV
X28Nd5PvlhR8Xta/G4w2qBQ=
=OLa8
-----END PGP SIGNATURE-----



pgsql-hackers by date:

Previous
From: Lamar Owen
Date:
Subject: Re: -fpic vs. -fPIC
Next
From: Tom Lane
Date:
Subject: Re: $Id$ -> $PostgreSQL$ Change