Thread: Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Kevin Grittner <kgrittn@postgresql.org> writes: > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. regards, tom lane
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Hitoshi Harada
Date:
On Wed, Jul 17, 2013 at 7:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@postgresql.org> writes: >> Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. > > The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch > is broken. > Looks like rd_indpred is not correct if index relation is fresh. Something like this works for me. diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index edd34ff..46149ee 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -634,7 +634,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid) /* Skip partial indexes. */ indexRel = index_open(index->indexrelid, RowExclusiveLock); - if (indexRel->rd_indpred != NIL) + if (RelationGetIndexPredicate(indexRel) != NIL) { index_close(indexRel, NoLock); ReleaseSysCache(indexTuple); -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > Looks like rd_indpred is not correct if index relation is fresh. > Something like this works for me. > - if (indexRel->rd_indpred != NIL) > + if (RelationGetIndexPredicate(indexRel) != NIL) Hm, yeah, the direct access to rd_indpred is certainly wrong. Will apply, thanks! regards, tom lane
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Andres Freund
Date:
On 2013-07-17 10:11:28 -0400, Tom Lane wrote: > Kevin Grittner <kgrittn@postgresql.org> writes: > > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. > > The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch > is broken. Jagarundi still tells that story. At least something like the following patch seems to be required. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Robert Haas
Date:
On Mon, Jul 22, 2013 at 6:01 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-07-17 10:11:28 -0400, Tom Lane wrote: >> Kevin Grittner <kgrittn@postgresql.org> writes: >> > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. >> >> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch >> is broken. > > Jagarundi still tells that story. At least something like the following > patch seems to be required. Thanks, I was noticing that breakage today, too. I committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-07-17 10:11:28 -0400, Tom Lane wrote: >> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch >> is broken. > Jagarundi still tells that story. Uh, no. Jagarundi was perfectly happy for several build cycles after I committed 405a468. The buildfarm history shows that it started complaining again after this change: f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs > At least something like the following patch seems to be required. This does look like a necessary change, but I suspect there is also something rotten in f01d1ae. regards, tom lane
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Andres Freund
Date:
On 2013-07-22 19:09:13 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-07-17 10:11:28 -0400, Tom Lane wrote: > >> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch > >> is broken. > > > Jagarundi still tells that story. > > Uh, no. Jagarundi was perfectly happy for several build cycles after > I committed 405a468. The buildfarm history shows that it started > complaining again after this change: > > f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs Huh? That's rather strange. No code from that patch is even exececuted until the alter_table regression tests way later. So I can't see how that patch could cause the matview error. Which is pretty clearly using an freed relcache entry. I think this may be a timing issue. > > At least something like the following patch seems to be required. > > This does look like a necessary change, but I suspect there is also > something rotten in f01d1ae. The alter table failure lateron seems to be "ERROR: relation "tmp" already exists", which just seems to be a consequence of incomplete cleanup due to the earlier crashes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Andres Freund
Date:
On 2013-07-23 00:01:50 +0200, Andres Freund wrote: > On 2013-07-17 10:11:28 -0400, Tom Lane wrote: > > Kevin Grittner <kgrittn@postgresql.org> writes: > > > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. > > > > The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch > > is broken. > > Jagarundi still tells that story. At least something like the following > patch seems to be required. Hm. There seems to be more things that need some more improvement from a quick look. First, I have my doubts of the general modus operandy of CONCURRENTLY here. I think in many, many cases it would be faster to simply swap in the newly built heap + indexes in concurrently. I think I have seen that mentioned in the review while mostly skipping the discussion, but still. That would be easy enough as of Robert's mvcc patch. * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance be done a good bit earlier? We're executing queries beforethat. * The loop over indexes in refresh_by_match_merge should index_open(ExclusiveLock) the indexes initially instead of searchingthe syscache manually. They are opened inside the loop in many cases anyway. There probably aren't any hazards currently,but I am not even sure about that. The index_open() in the loop at the very least processes the invalidation messagesother backends send... I'd even suggest using BuildIndexInfo() or such on the indexes, then you could use ->ii_Expressionset al instead of peeking into indkeys by hand. * All not explicitly looked up operators should be qualified using OPERATOR(). It's easy to define your own = operator fortid and set the search_path to public,pg_catalog. Now, the whole thing is restricted to talbe owners afaics, making thisdecidedly less dicey, but still. But if anyobdy actually uses a search_path like the above you can easily hurt them. * Same seems to hold true for the y = x and y.* IS DISTINCT FROM x.* operations. * (y.*) = (x.*) also needs to do proper operator lookups. * OpenMatViewIncrementalMaintenance() et al seem to be underdocumented. * why is it even allowed that matview_maintenance_depth is > 1? Unless I miss something there doesn't seem to be supportfor a recursive refresh whatever that would mean? * INSERT and DELETE should also alias the table names, to make sure there cannot be issues with the nested queries. * I'd strongly suggest more descriptive table aliases than x, y, d. Those make the statements rather hard to parse. * SPI_exec() is deprecated in favor of SPI_execute() * ISTM deferred uniqueness checks and similar things should be enforced to run ub refresh_by_match_merge() I think this patch/feature needs a good bit of work... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Kevin Grittner
Date:
Andres Freund <andres@2ndquadrant.com> wrote: > Hm. There seems to be more things that need some more improvement > from a quick look. > > First, I have my doubts of the general modus operandy of > CONCURRENTLY here. I think in many, many cases it would be faster > to simply swap in the newly built heap + indexes in concurrently. > I think I have seen that mentioned in the review while mostly > skipping the discussion, but still. That would be easy enough as > of Robert's mvcc patch. Yeah, in many cases you would not want to choose to specify this technique. The point was to have a technique which could run without blocking while a heavy load of queries (including perhaps a very long-running query) was executing. If subsequent events provide an alternative way to do that, it's worth looking at it; but my hope was to get this out of the way to allow time to develop incremental maintenance of matviews for the next CF. If we spend too much time reworking this to micro-optimize it for new patches, incremental maintenance might not happen for 9.4. The bigger point is perhaps syntax. If MVCC catalogs are really at a point where we can substitute a new heap and new indexes for a matview without taking an AccessExclusiveLock, then we should just make the current code do that. I think there is still a place for a differential update for large matviews which only need small changes on a REFRESH, but perhaps the right term for that is not CONCURRENTLY. > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > be done a good bit earlier? We're executing queries before > that. This can't be in effect while we are creating temp tables. If we're going to support a differential REFRESH technique, it must be done more "surgically" than to wrap the whole REFRESH statement execution. > * The loop over indexes in refresh_by_match_merge should > index_open(ExclusiveLock) the indexes initially instead of > searching the syscache manually. They are opened inside the > loop in many cases anyway. There probably aren't any hazards > currently, but I am not even sure about that. The index_open() > in the loop at the very least processes the invalidation > messages other backends send... Fair enough. That seems like a simple change. > I'd even suggest using BuildIndexInfo() or such on the indexes, > then you could use ->ii_Expressions et al instead of peeking > into indkeys by hand. Given that the function is in a source file described as containing "code to create and destroy POSTGRES index relations" and the comments for that function say that it "stores the information about the index that's needed by FormIndexDatum, which is used for both index_build() and later insertion of individual index tuples," and we're not going to create or destroy an index or call FormIndexDatum or insert individual index tuples from this code, it would seem to be a significant expansion of the charter of that function. What benefits do you see to using that level? > * All not explicitly looked up operators should be qualified > using OPERATOR(). It's easy to define your own = operator for > tid and set the search_path to public,pg_catalog. Now, the > whole thing is restricted to talbe owners afaics, making this > decidedly less dicey, but still. But if anyobdy actually uses a > search_path like the above you can easily hurt them. > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > x.* operations. > * (y.*) = (x.*) also needs to do proper operator lookups. I wasn't aware that people could override the equality operators for tid and RECORD, so that seemed like unnecessary overhead. I can pull the operators from the btree AM if people can do that. > * OpenMatViewIncrementalMaintenance() et al seem to be > underdocumented. If the adjacent comment for MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain about what OpenMatViewIncrementalMaintenance() and CloseMatViewIncrementalMaintenance() are for, I can add comments. At the time I wrote it, it seemed to me to be redundant to have such comments, and would cause people to waste more time looking at them then would be saved by anything they could add to what the function names and one-line function bodies conveyed; but if there was doubt in your mind about when they should be called, I'll add something. Would it help to move the static functions underneath the (documented) public function and its comment? > * why is it even allowed that matview_maintenance_depth is > 1? > Unless I miss something there doesn't seem to be support for a > recursive refresh whatever that would mean? I was trying to create the function in a way that would be useful for incremental maintenance, too. Those could conceivably recurse. Also, this way bugs in usage are more easily detected. > * INSERT and DELETE should also alias the table names, to make > sure there cannot be issues with the nested queries. I don't see the hazard -- could you elaborate? > * I'd strongly suggest more descriptive table aliases than x, y, > d. Those make the statements rather hard to parse. I guess. Those are intended to be internal, but I guess there's no reason not to be more verbose in the aliases. > * SPI_exec() is deprecated in favor of SPI_execute() According to what? The documentation of SPI_exec doesn't say anything about it being deprecated; it does say: | SPI_exec is the same as SPI_execute, with the latter's read_only | parameter always taken as false. I didn't see any particular benefit to using the more verbose form, but it's certainly easy enough to change if people don't like the shorter form. > * ISTM deferred uniqueness checks and similar things should be > enforced to run ub refresh_by_match_merge() "ub"? That looks like a typo, but I'm not sure what you mean. Could you show a case where there is a hazard? There was also a mention of a temp table left around. I had intended to specify ON COMMIT DROP for the internally used temp tables, but apparently lost track of that. I'll wait for the thread to settle down to offer a patch (or set of patches). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Andres Freund
Date:
Hi Kevin, On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > Hm. There seems to be more things that need some more improvement > > from a quick look. > > > > First, I have my doubts of the general modus operandy of > > CONCURRENTLY here. I think in many, many cases it would be faster > > to simply swap in the newly built heap + indexes in concurrently. > > I think I have seen that mentioned in the review while mostly > > skipping the discussion, but still. That would be easy enough as > > of Robert's mvcc patch. > > Yeah, in many cases you would not want to choose to specify this > technique. The point was to have a technique which could run > without blocking while a heavy load of queries (including perhaps a > very long-running query) was executing. If subsequent events > provide an alternative way to do that, it's worth looking at it; > but my hope was to get this out of the way to allow time to develop > incremental maintenance of matviews for the next CF. I thought there was a pretty obvious way to do this concurrently. But after mulling over it for a few days I either miss a connection I made previously or, more likely, I have missed some fundamental difficulties previously. I'd would be fairly easy if indexes were attached to a relations relfilenode, not a relations oid... But changing that certainly doesn't fall in the "easy enough" category anymore. > The bigger point is perhaps syntax. If MVCC catalogs are really at > a point where we can substitute a new heap and new indexes for a > matview without taking an AccessExclusiveLock, then we should just > make the current code do that. I think there is still a place for > a differential update for large matviews which only need small > changes on a REFRESH, but perhaps the right term for that is not > CONCURRENTLY. Hm. Yes. Especially as any CONCURRENTLY implementation building and swapping in a new heap that I can think of requires to be run outside a transaction (like CIC et al). > > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > > be done a good bit earlier? We're executing queries before > > that. > > This can't be in effect while we are creating temp tables. If > we're going to support a differential REFRESH technique, it must be > done more "surgically" than to wrap the whole REFRESH statement > execution. Sure, it cannot be active when creating the temp table. But it's currently inactive while you SPI_execute queries. That can't be right. > > I'd even suggest using BuildIndexInfo() or such on the indexes, > > then you could use ->ii_Expressions et al instead of peeking > > into indkeys by hand. > > Given that the function is in a source file described as containing > "code to create and destroy POSTGRES index relations" and the > comments for that function say that it "stores the information > about the index that's needed by FormIndexDatum, which is used for > both index_build() and later insertion of individual index tuples," > and we're not going to create or destroy an index or call > FormIndexDatum or insert individual index tuples from this code, it > would seem to be a significant expansion of the charter of that > function. What benefits do you see to using that level? I'd prevent you from needing to peek into indkeys. Note that it's already used in various places. > > * All not explicitly looked up operators should be qualified > > using OPERATOR(). It's easy to define your own = operator for > > tid and set the search_path to public,pg_catalog. Now, the > > whole thing is restricted to talbe owners afaics, making this > > decidedly less dicey, but still. But if anyobdy actually uses a > > search_path like the above you can easily hurt them. > > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > > x.* operations. > > * (y.*) = (x.*) also needs to do proper operator lookups. > > I wasn't aware that people could override the equality operators > for tid and RECORD, so that seemed like unnecessary overhead. I > can pull the operators from the btree AM if people can do that. Sure, You don't need any special privs for it: CREATE OPERATOR public.= ( LEFTARG = tid, RIGHTARG = tid, PROCEDURE = tideq); That's obviously not doing anything nefarious, since it calls the builtin function, but you get the idea. I think for the cases where you're comparing tids it's fine to just to hardcode the operator to OPERATOR(pg_catalog.=). > > * OpenMatViewIncrementalMaintenance() et al seem to be > > underdocumented. > > If the adjacent comment for > MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain > about what OpenMatViewIncrementalMaintenance() and > CloseMatViewIncrementalMaintenance() are for, I can add comments. > At the time I wrote it, it seemed to me to be redundant to have > such comments, and would cause people to waste more time looking at > them then would be saved by anything they could add to what the > function names and one-line function bodies conveyed; but if there > was doubt in your mind about when they should be called, I'll add > something. I am not asking for comments that basically paraphrase the functionname, I don't like those. More for something like: "Normally DML statements aren't allowed on materialized views, but since REFRESH CONCURRENTLY internally is implemented by running a bunch of queries executing DML (c.f. refresh_by_match_merge()) we need to temporarily allow it while refreshing. Callsites preventing DML and similar on materialized views thus need to check whether we're currently refreshing." > Would it help to move the static functions underneath the > (documented) public function and its comment? Yes. > > * why is it even allowed that matview_maintenance_depth is > 1? > > Unless I miss something there doesn't seem to be support for a > > recursive refresh whatever that would mean? > > I was trying to create the function in a way that would be useful > for incremental maintenance, too. Those could conceivably recurse. > Also, this way bugs in usage are more easily detected. Also, that belongs in a comment. > > * INSERT and DELETE should also alias the table names, to make > > sure there cannot be issues with the nested queries. > > I don't see the hazard -- could you elaborate? I don't think there is an actual hazard *right now*. But at least for the DELETE case it took me a second or two to make sure there's no case a matview called 'd' cannot cause problems. > > * I'd strongly suggest more descriptive table aliases than x, y, > > d. Those make the statements rather hard to parse. > > I guess. Those are intended to be internal, but I guess there's no > reason not to be more verbose in the aliases. Well, for one, other people will read that code every now and then. I am not 100% convinced that all the statements are correct, and it's more effort to do so right now. Also, those statements will show up in error messages. I first wanted to amend the above to say that they will also show up in pg_stat_activity, but it doesn't look like it atm, you're never setting anything. Which is consistent with other commands, even if it strikes me as suboptimal. > > * SPI_exec() is deprecated in favor of SPI_execute() > According to what? The documentation of SPI_exec doesn't say > anything about it being deprecated; it does say: I only remembered the following comment from spi.c: /* Obsolete version of SPI_execute */ I guess, if the docs don't show that... > > * ISTM deferred uniqueness checks and similar things should be > > enforced to run ub refresh_by_match_merge() > > "ub"? That looks like a typo, but I'm not sure what you mean. > Could you show a case where there is a hazard? I think I wanted to type "in" but moved one row to far to the left... What I am thinking of is that you'll get a successfull REFRESH CONCURRENTLY but it will later error out at COMMIT time because there were constraint violations. Afaik we don't have any such behaviour for existing DDL and I don't like introducing it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Noah Misch
Date:
On Wed, Jul 31, 2013 at 03:39:14AM +0200, Andres Freund wrote: > On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote: > > Andres Freund <andres@2ndquadrant.com> wrote: > > > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > > > be done a good bit earlier? We're executing queries before > > > that. > > > > This can't be in effect while we are creating temp tables. If > > we're going to support a differential REFRESH technique, it must be > > done more "surgically" than to wrap the whole REFRESH statement > > execution. That InSecurityRestrictedOperation() check happens in DefineRelation(); I advise bypassing it by calling heap_create_with_catalog() directly. make_new_heap() is a comparable example. It's a good thing for more reasons than the one prompting this. For example, the current implementation will fail if the caller lacks TEMP permission on the database. Though it's probably a rare thing to deny TEMP but allow creating permanent objects such as a MV, that's a wart. > Sure, it cannot be active when creating the temp table. But it's > currently inactive while you SPI_execute queries. That can't be right. It may be okay once the operator search issues Andres notes later are fixed. The queries that now run outside a security restricted context _should_ call only functions associated with default B-tree or hash equality operators. Since only superusers can define operator classes, we can assume that those operators never do something unseemly. But if the code will indeed be safe for that reason, I would hope for a comment explaining as much. That would also make the SetUserIdAndSecContext() calls later in refresh_by_match_merge() unnecessary. All that being said, the implementation would be more robust if we remove the SQL-level use of CREATE TEMP TABLE and return to running the entire operation as the MV owner. > > > * All not explicitly looked up operators should be qualified > > > using OPERATOR(). It's easy to define your own = operator for > > > tid and set the search_path to public,pg_catalog. Now, the > > > whole thing is restricted to talbe owners afaics, making this > > > decidedly less dicey, but still. But if anyobdy actually uses a > > > search_path like the above you can easily hurt them. > > > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > > > x.* operations. > > > * (y.*) = (x.*) also needs to do proper operator lookups. A few supplements to Andres's comments: 1. ri_triggers.c is the model for constructing queries in a manner that guarantees uniform semantics in the face of overloading and search_path variations. Check out how it handles these issues. 2. DISTINCT FROM and IN syntax cannot be used in this kind of query at all, because they always depend on the search path. However, you could construct your own DistinctExpr node referencing the operator you want. Here's an SQL statement generated by this code that currently uses "rowvar IS DISTINCT FROM rowvar": CREATE TEMP TABLE pg_temp_1.pg_temp_41108_2 AS SELECT x.ctid AS tid, y FROM public.mv x FULL JOIN pg_temp_1.pg_temp_41108 y ON(y.key OPERATOR(pg_catalog.=) x.key AND y = x) WHERE (y.*) IS DISTINCT FROM (x.*) ORDER BY tid The point of the "(y.*) IS DISTINCT FROM (x.*)" is just to filter out the inner portion of the full join, right? Could write that a different way. > I think for the cases where you're comparing tids it's fine to just to > hardcode the operator to OPERATOR(pg_catalog.=). Yes, that's fine whenever you're dealing with a known data type that uses "=" as its default B-tree equality operator (e.g. nearly every built-in type). So "(x.*) OPERATOR(pg_catalog.=) (y.*)" for records is fine, too. Kevin, this code is aware that "rowvar = rowvar" uses DISTINCT FROM semantics for individual columns, right? The block comment at refresh_by_match_merge() talks about NULL comparisons, but I couldn't tell definitively whether it recognizes that fact. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Kevin Grittner
Date:
Some of the issues raised by Andres and Noah have been addressed. These all seemed simple and non-controversial, so I've just applied the suggested fixes. Some issues remain, such as how best to create the temp table used for the "diff" data, and the related simplification of the security context swapping that might become possible with a change there. Review of that area has raised a couple other questions that I'm looking into. These all probably amount to enough that I will add the patch(es) to address them to the next CF. Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >>> The loop over indexes in refresh_by_match_merge should >>> index_open(ExclusiveLock) the indexes initially instead of >>> searching the syscache manually. They are opened inside the >>> loop in many cases anyway. There probably aren't any hazards >>> currently, but I am not even sure about that. The index_open() >>> in the loop at the very least processes the invalidation >>> messages other backends send... Fixed. >>> I'd even suggest using BuildIndexInfo() or such on the indexes, >>> then you could use ->ii_Expressions et al instead of peeking >>> into indkeys by hand. >> >> Given that the function is in a source file described as containing >> "code to create and destroy POSTGRES index relations" and the >> comments for that function say that it "stores the information >> about the index that's needed by FormIndexDatum, which is used for >> both index_build() and later insertion of individual index tuples," >> and we're not going to create or destroy an index or call >> FormIndexDatum or insert individual index tuples from this code, it >> would seem to be a significant expansion of the charter of that >> function. What benefits do you see to using that level? > > I'd prevent you from needing to peek into indkeys. Note that it's > already used in various places. I looked at where it was and wasn't used, and continue to be skeptical. For example, both techniques are used in tablecmds.c; the technique you suggest is used where an index is being created, dropped, or index tuples are being manipulated, while the Form_pg_index structure is being used when the definition of the index is being examined without directly manipulating it. Compare what is being done in my code with the existing code for ATExecDropNotNull(), for example. >>> [while comparison of indexed columns used OPERATOR() correctly, >>> comparison of tid and rowvar values did not] >> I wasn't aware that people could override the equality operators >> for tid and RECORD >> [example proving the possibility] >> I think for the cases where you're comparing tids it's fine to just >> to hardcode the operator to OPERATOR(pg_catalog.=). Done. >>> * I'd strongly suggest more descriptive table aliases than x, y, >>> d. Those make the statements rather hard to parse. >> >> I guess. Those are intended to be internal, but I guess there's no >> reason not to be more verbose in the aliases. > > Well, for one, other people will read that code every now and then. I am > not 100% convinced that all the statements are correct, and it's more > effort to do so right now. Also, those statements will show up in error > messages. Done. > What I am thinking of is that you'll get a successfull REFRESH > CONCURRENTLY but it will later error out at COMMIT time because there > were constraint violations. Afaik we don't have any such behaviour for > existing DDL and I don't like introducing it. REFRESH MATERIALIZED VIEW CONCURRENTLY is definitely not DDL. It is DML, and behavior should be consistent with that. (Note that the definition of the matview remains exactly the same after the statement executes as it was before; only the data is modified.) Without the CONCURRENTLY clause it's in the same sort of gray area as TRUNCATE TABLE, where it is essentially DML, but the implementation details are similar to that of DDL, so it may sometimes be hard to avoid DDL-like behaviors, even though it would be best to do so. We have no such problem here. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
From
Andres Freund
Date:
Hi On 2013-08-05 08:37:57 -0700, Kevin Grittner wrote: > Some of the issues raised by Andres and Noah have been addressed. > These all seemed simple and non-controversial, so I've just applied > the suggested fixes. Cool! > >>> I'd even suggest using BuildIndexInfo() or such on the indexes, > >>> then you could use ->ii_Expressions et al instead of peeking > >>> into indkeys by hand. > >> > >> Given that the function is in a source file described as containing > >> "code to create and destroy POSTGRES index relations" and the > >> comments for that function say that it "stores the information > >> about the index that's needed by FormIndexDatum, which is used for > >> both index_build() and later insertion of individual index tuples," > >> and we're not going to create or destroy an index or call > >> FormIndexDatum or insert individual index tuples from this code, it > >> would seem to be a significant expansion of the charter of that > >> function. What benefits do you see to using that level? > > > > I'd prevent you from needing to peek into indkeys. Note that it's > > already used in various places. > > I looked at where it was and wasn't used, and continue to be > skeptical. For example, both techniques are used in tablecmds.c; > the technique you suggest is used where an index is being created, > dropped, or index tuples are being manipulated, while the > Form_pg_index structure is being used when the definition of the > index is being examined without directly manipulating it. Compare > what is being done in my code with the existing code for > ATExecDropNotNull(), for example. The RelationGetIndexExpressions() you mentioned in the commit sounds like a good plan to me. Didn't remember that existed. Don't think the ATExecDropNotNull() comparison is really valid, we need to know more details there, but anyway, you've found something a good bit better. > > What I am thinking of is that you'll get a successfull REFRESH > > CONCURRENTLY but it will later error out at COMMIT time because there > > were constraint violations. Afaik we don't have any such behaviour for > > existing DDL and I don't like introducing it.0 > > REFRESH MATERIALIZED VIEW CONCURRENTLY is definitely not DDL. It > is DML, and behavior should be consistent with that. (Note that > the definition of the matview remains exactly the same after the > statement executes as it was before; only the data is modified.) > Without the CONCURRENTLY clause it's in the same sort of gray area > as TRUNCATE TABLE, where it is essentially DML, but the > implementation details are similar to that of DDL, so it may > sometimes be hard to avoid DDL-like behaviors, even though it would > be best to do so. We have no such problem here. But there's no usecase that makes deferred checks and similar useful afaics. And it seems to me like it certainly will confuse users that a second RMVC fails (via CheckTableNotInUse()) because there's still a deferred trigger queue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services