Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
Date | |
Msg-id | 1374596854.54580.YahooMailNeo@web162904.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW
CONCURRENTLY.
|
List | pgsql-hackers |
Andres Freund <andres@2ndquadrant.com> wrote: > Hm. There seems to be more things that need some more improvement > from a quick look. > > First, I have my doubts of the general modus operandy of > CONCURRENTLY here. I think in many, many cases it would be faster > to simply swap in the newly built heap + indexes in concurrently. > I think I have seen that mentioned in the review while mostly > skipping the discussion, but still. That would be easy enough as > of Robert's mvcc patch. Yeah, in many cases you would not want to choose to specify this technique. The point was to have a technique which could run without blocking while a heavy load of queries (including perhaps a very long-running query) was executing. If subsequent events provide an alternative way to do that, it's worth looking at it; but my hope was to get this out of the way to allow time to develop incremental maintenance of matviews for the next CF. If we spend too much time reworking this to micro-optimize it for new patches, incremental maintenance might not happen for 9.4. The bigger point is perhaps syntax. If MVCC catalogs are really at a point where we can substitute a new heap and new indexes for a matview without taking an AccessExclusiveLock, then we should just make the current code do that. I think there is still a place for a differential update for large matviews which only need small changes on a REFRESH, but perhaps the right term for that is not CONCURRENTLY. > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > be done a good bit earlier? We're executing queries before > that. This can't be in effect while we are creating temp tables. If we're going to support a differential REFRESH technique, it must be done more "surgically" than to wrap the whole REFRESH statement execution. > * The loop over indexes in refresh_by_match_merge should > index_open(ExclusiveLock) the indexes initially instead of > searching the syscache manually. They are opened inside the > loop in many cases anyway. There probably aren't any hazards > currently, but I am not even sure about that. The index_open() > in the loop at the very least processes the invalidation > messages other backends send... Fair enough. That seems like a simple change. > I'd even suggest using BuildIndexInfo() or such on the indexes, > then you could use ->ii_Expressions et al instead of peeking > into indkeys by hand. Given that the function is in a source file described as containing "code to create and destroy POSTGRES index relations" and the comments for that function say that it "stores the information about the index that's needed by FormIndexDatum, which is used for both index_build() and later insertion of individual index tuples," and we're not going to create or destroy an index or call FormIndexDatum or insert individual index tuples from this code, it would seem to be a significant expansion of the charter of that function. What benefits do you see to using that level? > * All not explicitly looked up operators should be qualified > using OPERATOR(). It's easy to define your own = operator for > tid and set the search_path to public,pg_catalog. Now, the > whole thing is restricted to talbe owners afaics, making this > decidedly less dicey, but still. But if anyobdy actually uses a > search_path like the above you can easily hurt them. > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > x.* operations. > * (y.*) = (x.*) also needs to do proper operator lookups. I wasn't aware that people could override the equality operators for tid and RECORD, so that seemed like unnecessary overhead. I can pull the operators from the btree AM if people can do that. > * OpenMatViewIncrementalMaintenance() et al seem to be > underdocumented. If the adjacent comment for MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain about what OpenMatViewIncrementalMaintenance() and CloseMatViewIncrementalMaintenance() are for, I can add comments. At the time I wrote it, it seemed to me to be redundant to have such comments, and would cause people to waste more time looking at them then would be saved by anything they could add to what the function names and one-line function bodies conveyed; but if there was doubt in your mind about when they should be called, I'll add something. Would it help to move the static functions underneath the (documented) public function and its comment? > * why is it even allowed that matview_maintenance_depth is > 1? > Unless I miss something there doesn't seem to be support for a > recursive refresh whatever that would mean? I was trying to create the function in a way that would be useful for incremental maintenance, too. Those could conceivably recurse. Also, this way bugs in usage are more easily detected. > * INSERT and DELETE should also alias the table names, to make > sure there cannot be issues with the nested queries. I don't see the hazard -- could you elaborate? > * I'd strongly suggest more descriptive table aliases than x, y, > d. Those make the statements rather hard to parse. I guess. Those are intended to be internal, but I guess there's no reason not to be more verbose in the aliases. > * SPI_exec() is deprecated in favor of SPI_execute() According to what? The documentation of SPI_exec doesn't say anything about it being deprecated; it does say: | SPI_exec is the same as SPI_execute, with the latter's read_only | parameter always taken as false. I didn't see any particular benefit to using the more verbose form, but it's certainly easy enough to change if people don't like the shorter form. > * ISTM deferred uniqueness checks and similar things should be > enforced to run ub refresh_by_match_merge() "ub"? That looks like a typo, but I'm not sure what you mean. Could you show a case where there is a hazard? There was also a mention of a temp table left around. I had intended to specify ON COMMIT DROP for the internally used temp tables, but apparently lost track of that. I'll wait for the thread to settle down to offer a patch (or set of patches). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: