Re: Materialized views proposal - Mailing list pgsql-hackers
From | Jonathan Gardner |
---|---|
Subject | Re: Materialized views proposal |
Date | |
Msg-id | 200311291300.10336.jgardner@jonathangardner.net Whole thread Raw |
In response to | Re: Materialized views proposal (Jonathan Gardner <jonagard@amazon.com>) |
Responses |
Re: Materialized views proposal
|
List | pgsql-hackers |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 26 November 2003 10:58 am, Jonathan Gardner wrote: > On Wednesday 26 November 2003 09:19, Hannu Krosing wrote: > > What is needed is good algorithms. Writing C code is secondary to that. > > > > Similar problem has kept us from implementing updatable views for quite > > some time. > > You are definitely correct. Preliminary research has shown that: 1) We would put triggers on each table that contributes to a view. 2) We would only be interested in those inserts and deletes (counting an update as a delete and then an insert) that satisfy the "WHERE" clause of the view. 3) We would implement some sort of differential view update scheme based on the paper "Efficiently Updating Materialized Views"[1]. They classify all select queries into one of four categories and describe how to go about producing a differential update. I couldn't understand this part at all. - --- I am already too deep in database theory. While I enjoy reading a good paper, I just don't have the background in DB science like I do in Physics. For instance, my set theory is very weak (I can't tell if U is a union or an intersection, for instanct). At this point, I have a choice. Drop a couple hundred dollars on database theory textbooks and spend the next three months learning it, or hoping one of you database theorists out there take pity on me and pick this up or coach me through it. Any takers on the second one? - --- I have a proposal. Let me hear what you think. There will be two kinds of materialized view in PostgreSQL: The first is the "snapshot" approach. Provide a materialized view, but don't provide automatic updating. The user can call a refresh statement to repopulate the entire view from time to time as appropriate. The benefit of this is that you can use queries that have functions that are immutable or stable. (You could probably use volatile as well but I wouldn't recommend it). This is so easy to implement, it isn't even funny. Optimizing this would involve collecting all the inserted / updated /deleted rows since the last snapshot. We can have a logging table that accumulates all the changes since the last refresh for this purpose. (This kind of table may exist anyway for replication or other purposes.) Finally, we could examine the changed rows, ignore the irrelevant ones, ignore the redundant ones (IE, rows that have been inserted and then deleted during that time), and decide whether doing a complete refresh would be quicker than doing several differential updates to the materialized view. The second is the automatically updated materialized view. Each insert, update, or delete executed against the table that are queried for the view will trigger a function call using the algorithms proposed in [1]. Every function in the query must be stable. We could provide a mechanism to alter the materialized view between snapshot to auto-updated. It would be as simple as refreshing the snapshot and then enabling the triggers, or as simple as disabling the triggers. I already have written a set of scripts in perl that provides the first proposal without the optimization idea. I will put them up at GBorg. Footnotes: [1] ftp://ftp.research.microsoft.com/users/palarson/sigmod86.ps - -- Jonathan Gardner jgardner@jonathangardner.net Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/yQjZWgwF3QvpWNwRAnqkAKCasoODeV2+KcP52DMXiEyq7pfhmACcCFBV X28Nd5PvlhR8Xta/G4w2qBQ= =OLa8 -----END PGP SIGNATURE-----
pgsql-hackers by date: