Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
Date | |
Msg-id | 20130731013914.GD19053@alap2.anarazel.de Whole thread Raw |
In response to | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW
CONCURRENTLY.
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
List | pgsql-hackers |
Hi Kevin, On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > Hm. There seems to be more things that need some more improvement > > from a quick look. > > > > First, I have my doubts of the general modus operandy of > > CONCURRENTLY here. I think in many, many cases it would be faster > > to simply swap in the newly built heap + indexes in concurrently. > > I think I have seen that mentioned in the review while mostly > > skipping the discussion, but still. That would be easy enough as > > of Robert's mvcc patch. > > Yeah, in many cases you would not want to choose to specify this > technique. The point was to have a technique which could run > without blocking while a heavy load of queries (including perhaps a > very long-running query) was executing. If subsequent events > provide an alternative way to do that, it's worth looking at it; > but my hope was to get this out of the way to allow time to develop > incremental maintenance of matviews for the next CF. I thought there was a pretty obvious way to do this concurrently. But after mulling over it for a few days I either miss a connection I made previously or, more likely, I have missed some fundamental difficulties previously. I'd would be fairly easy if indexes were attached to a relations relfilenode, not a relations oid... But changing that certainly doesn't fall in the "easy enough" category anymore. > The bigger point is perhaps syntax. If MVCC catalogs are really at > a point where we can substitute a new heap and new indexes for a > matview without taking an AccessExclusiveLock, then we should just > make the current code do that. I think there is still a place for > a differential update for large matviews which only need small > changes on a REFRESH, but perhaps the right term for that is not > CONCURRENTLY. Hm. Yes. Especially as any CONCURRENTLY implementation building and swapping in a new heap that I can think of requires to be run outside a transaction (like CIC et al). > > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > > be done a good bit earlier? We're executing queries before > > that. > > This can't be in effect while we are creating temp tables. If > we're going to support a differential REFRESH technique, it must be > done more "surgically" than to wrap the whole REFRESH statement > execution. Sure, it cannot be active when creating the temp table. But it's currently inactive while you SPI_execute queries. That can't be right. > > I'd even suggest using BuildIndexInfo() or such on the indexes, > > then you could use ->ii_Expressions et al instead of peeking > > into indkeys by hand. > > Given that the function is in a source file described as containing > "code to create and destroy POSTGRES index relations" and the > comments for that function say that it "stores the information > about the index that's needed by FormIndexDatum, which is used for > both index_build() and later insertion of individual index tuples," > and we're not going to create or destroy an index or call > FormIndexDatum or insert individual index tuples from this code, it > would seem to be a significant expansion of the charter of that > function. What benefits do you see to using that level? I'd prevent you from needing to peek into indkeys. Note that it's already used in various places. > > * All not explicitly looked up operators should be qualified > > using OPERATOR(). It's easy to define your own = operator for > > tid and set the search_path to public,pg_catalog. Now, the > > whole thing is restricted to talbe owners afaics, making this > > decidedly less dicey, but still. But if anyobdy actually uses a > > search_path like the above you can easily hurt them. > > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > > x.* operations. > > * (y.*) = (x.*) also needs to do proper operator lookups. > > I wasn't aware that people could override the equality operators > for tid and RECORD, so that seemed like unnecessary overhead. I > can pull the operators from the btree AM if people can do that. Sure, You don't need any special privs for it: CREATE OPERATOR public.= ( LEFTARG = tid, RIGHTARG = tid, PROCEDURE = tideq); That's obviously not doing anything nefarious, since it calls the builtin function, but you get the idea. I think for the cases where you're comparing tids it's fine to just to hardcode the operator to OPERATOR(pg_catalog.=). > > * OpenMatViewIncrementalMaintenance() et al seem to be > > underdocumented. > > If the adjacent comment for > MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain > about what OpenMatViewIncrementalMaintenance() and > CloseMatViewIncrementalMaintenance() are for, I can add comments. > At the time I wrote it, it seemed to me to be redundant to have > such comments, and would cause people to waste more time looking at > them then would be saved by anything they could add to what the > function names and one-line function bodies conveyed; but if there > was doubt in your mind about when they should be called, I'll add > something. I am not asking for comments that basically paraphrase the functionname, I don't like those. More for something like: "Normally DML statements aren't allowed on materialized views, but since REFRESH CONCURRENTLY internally is implemented by running a bunch of queries executing DML (c.f. refresh_by_match_merge()) we need to temporarily allow it while refreshing. Callsites preventing DML and similar on materialized views thus need to check whether we're currently refreshing." > Would it help to move the static functions underneath the > (documented) public function and its comment? Yes. > > * why is it even allowed that matview_maintenance_depth is > 1? > > Unless I miss something there doesn't seem to be support for a > > recursive refresh whatever that would mean? > > I was trying to create the function in a way that would be useful > for incremental maintenance, too. Those could conceivably recurse. > Also, this way bugs in usage are more easily detected. Also, that belongs in a comment. > > * INSERT and DELETE should also alias the table names, to make > > sure there cannot be issues with the nested queries. > > I don't see the hazard -- could you elaborate? I don't think there is an actual hazard *right now*. But at least for the DELETE case it took me a second or two to make sure there's no case a matview called 'd' cannot cause problems. > > * I'd strongly suggest more descriptive table aliases than x, y, > > d. Those make the statements rather hard to parse. > > I guess. Those are intended to be internal, but I guess there's no > reason not to be more verbose in the aliases. Well, for one, other people will read that code every now and then. I am not 100% convinced that all the statements are correct, and it's more effort to do so right now. Also, those statements will show up in error messages. I first wanted to amend the above to say that they will also show up in pg_stat_activity, but it doesn't look like it atm, you're never setting anything. Which is consistent with other commands, even if it strikes me as suboptimal. > > * SPI_exec() is deprecated in favor of SPI_execute() > According to what? The documentation of SPI_exec doesn't say > anything about it being deprecated; it does say: I only remembered the following comment from spi.c: /* Obsolete version of SPI_execute */ I guess, if the docs don't show that... > > * ISTM deferred uniqueness checks and similar things should be > > enforced to run ub refresh_by_match_merge() > > "ub"? That looks like a typo, but I'm not sure what you mean. > Could you show a case where there is a hazard? I think I wanted to type "in" but moved one row to far to the left... What I am thinking of is that you'll get a successfull REFRESH CONCURRENTLY but it will later error out at COMMIT time because there were constraint violations. Afaik we don't have any such behaviour for existing DDL and I don't like introducing it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: