Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
Date | |
Msg-id | 20130801042216.GA331932@tornado.leadboat.com Whole thread Raw |
In response to | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
On Wed, Jul 31, 2013 at 03:39:14AM +0200, Andres Freund wrote: > On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote: > > Andres Freund <andres@2ndquadrant.com> wrote: > > > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > > > be done a good bit earlier? We're executing queries before > > > that. > > > > This can't be in effect while we are creating temp tables. If > > we're going to support a differential REFRESH technique, it must be > > done more "surgically" than to wrap the whole REFRESH statement > > execution. That InSecurityRestrictedOperation() check happens in DefineRelation(); I advise bypassing it by calling heap_create_with_catalog() directly. make_new_heap() is a comparable example. It's a good thing for more reasons than the one prompting this. For example, the current implementation will fail if the caller lacks TEMP permission on the database. Though it's probably a rare thing to deny TEMP but allow creating permanent objects such as a MV, that's a wart. > Sure, it cannot be active when creating the temp table. But it's > currently inactive while you SPI_execute queries. That can't be right. It may be okay once the operator search issues Andres notes later are fixed. The queries that now run outside a security restricted context _should_ call only functions associated with default B-tree or hash equality operators. Since only superusers can define operator classes, we can assume that those operators never do something unseemly. But if the code will indeed be safe for that reason, I would hope for a comment explaining as much. That would also make the SetUserIdAndSecContext() calls later in refresh_by_match_merge() unnecessary. All that being said, the implementation would be more robust if we remove the SQL-level use of CREATE TEMP TABLE and return to running the entire operation as the MV owner. > > > * All not explicitly looked up operators should be qualified > > > using OPERATOR(). It's easy to define your own = operator for > > > tid and set the search_path to public,pg_catalog. Now, the > > > whole thing is restricted to talbe owners afaics, making this > > > decidedly less dicey, but still. But if anyobdy actually uses a > > > search_path like the above you can easily hurt them. > > > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > > > x.* operations. > > > * (y.*) = (x.*) also needs to do proper operator lookups. A few supplements to Andres's comments: 1. ri_triggers.c is the model for constructing queries in a manner that guarantees uniform semantics in the face of overloading and search_path variations. Check out how it handles these issues. 2. DISTINCT FROM and IN syntax cannot be used in this kind of query at all, because they always depend on the search path. However, you could construct your own DistinctExpr node referencing the operator you want. Here's an SQL statement generated by this code that currently uses "rowvar IS DISTINCT FROM rowvar": CREATE TEMP TABLE pg_temp_1.pg_temp_41108_2 AS SELECT x.ctid AS tid, y FROM public.mv x FULL JOIN pg_temp_1.pg_temp_41108 y ON(y.key OPERATOR(pg_catalog.=) x.key AND y = x) WHERE (y.*) IS DISTINCT FROM (x.*) ORDER BY tid The point of the "(y.*) IS DISTINCT FROM (x.*)" is just to filter out the inner portion of the full join, right? Could write that a different way. > I think for the cases where you're comparing tids it's fine to just to > hardcode the operator to OPERATOR(pg_catalog.=). Yes, that's fine whenever you're dealing with a known data type that uses "=" as its default B-tree equality operator (e.g. nearly every built-in type). So "(x.*) OPERATOR(pg_catalog.=) (y.*)" for records is fine, too. Kevin, this code is aware that "rowvar = rowvar" uses DISTINCT FROM semantics for individual columns, right? The block comment at refresh_by_match_merge() talks about NULL comparisons, but I couldn't tell definitively whether it recognizes that fact. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: