Re: [HACKERS] [SQL] Materialized View Summary - Mailing list pgsql-performance
From | Robert Treat |
---|---|
Subject | Re: [HACKERS] [SQL] Materialized View Summary |
Date | |
Msg-id | 1077723976.15368.6115.camel@camel Whole thread Raw |
In response to | Re: [HACKERS] [SQL] Materialized View Summary ("Jonathan M. Gardner" <jgardner@jonathangardner.net>) |
List | pgsql-performance |
On Wed, 2004-02-25 at 03:19, Jonathan M. Gardner wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'm not sure if my original reply made it through. Ignore the last one if > it did. But I liked the last one :-) > > On Tuesday 24 February 2004 1:48 pm, Robert Treat wrote: > > On Tue, 2004-02-24 at 12:11, Richard Huxton wrote: > > > On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: > > > > I've written a summary of my findings on implementing and using > > > > materialized views in PostgreSQL. I've already deployed eagerly > > > > updating materialized views on several views in a production > > > > environment for a company called RedWeek: http://redweek.com/. As a > > > > result, some queries that were taking longer than 30 seconds to run > > > > now run in a fraction of a millisecond. > > > > > > > > You can view my summary at > > > > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.h > > > >tml > > > > have you done much concurrency testing on your snapshot views? I > > implemented a similar scheme in one of my databases but found problems > > when I had concurrent "refresh attempts". I ended up serializing the > > calls view LOCKing, which was ok for my needs, but I thought > > potentially problematic in other cases. > > > > We are running into some small problems with deadlocks and multiple > inserts. It's not a problem unless we do a mass update to the data or > something like that. I'm interested in how you solved your problem. > Well, I have two different cases actually. In one case I have a master table with what are essentially 4 or 5 matviews based off of that. I don't allow updates to the matviews, only to the master table, and only via stored procedures. This would work better if locking semantics inside of pl functions worked properly, but currently we have the application lock the table in exclusive access mode and then call the function to make the data changes which then fires off a function to update the matviews. Since it's all within a transaction, readers of the matviews are oblivious to the change. IMO this whole method is a wizardry in database hack jobs that I would love to replace. The second case, and this one being much simpler, started out as a view that does aggregation across several other views and tables, which is pretty resource intensive but only returns 4 rows. I refresh the matview via a cron job which basically does a SELECT * FOR UPDATE on the matview, deletes the entire contents, then does an INSERT INTO matview SELECT * FROM view. Again since it's in a transaction, readers of the matview are happy (and apps are only granted select on the matview). Concurrency is kept because the cron job must wait to get a LOCK on the table before it can proceed with the delete/update. I have a feeling that this method could fall over given a high enough number of concurrent updaters, but works pretty well for our needs. > I am playing with an exclusive lock scheme that will lock all the > materialized views with an exclusive lock (see Section 12.3 for a > reminder on what exactly this means). The locks have to occur in order, > so I use a recursive function to traverse a dependency tree to the root > and then lock from there. Right now, we only have one materialized view > tree, but I can see some schemas having multiple seperate trees with > multiple roots. So I put in an ordering to lock the tables in a > pre-defined order. > > But if the two dependency trees are totally seperate, it is possible for > one transaction to lock tree A and then tree B, and for another to lock > tree B and then tree A, causing deadlock. > > Unfortunately, I can't force any update to the underlying tables to force > this locking function to be called. So we will probably call this > manually before we touch any of those tables. Yeah, I ran into similar problems as this, but ISTM you could do a before update trigger on the matview to do the locking (though I'd guess this would end in trouble due to plpgsql lock semantics, so maybe i shouldn't send you down a troubled road...) > > In the future, it would be nice to have a hook into the locking mechanism > so any kind of lock on the underlying tables can trigger this. > > Also, building the dependency trees is completely manual. Until I can get > some functions to actually assemble the triggers and such, automatic > building of the trees will be difficult. > I just noticed that your summary doesn't make use of postgresql RULES in any way, how much have you traveled down that path? We had cooked up a scheme for our second case where we would have a table that held an entry for the matview and then a timestamp of the last update/insert into any of the base tables the matview depended on. when then would create rules on all the base tables to do an update to the refresh table any time they were updated/inserted/deleted. We would then put a corresponding rule on the matview so that on each select from the matview, it would check to see if any of it's base tables had changed and if so fire off a refresh of itself. We ended up abandoning this idea as the complexity seemed to high when the simple scheme above worked equally well for our needs. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql-performance by date: