Thread: Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 07/09/2018 11:34 AM, Tom Lane wrote: >> I think the most practical way to deal with this probably is to change >> the parser so that the lookup works by finding a default btree or hash >> opclass rather than by looking for "=" by name. We've made similar >> changes in the past to get rid of implicit dependencies on operator >> names, but those efforts never reached IS [NOT] DISTINCT FROM. > I agree with your approach, including backpatching. I guess we'll have > to try to think of some scenario that backpatching would break. Maybe to > minimize any effect it should fall back on a default opclass if the > search for "=" fails? Dunno how practical that is. I poked into this area for awhile, and it turns out to be even a worse can of worms than I thought. I looked through gram.y and parse_expr.c, and identified several distinct classes of issue. (I'm not promising that I found everything.) 1. LIKE, ILIKE, SIMILAR TO are converted to unqualified operator names "~~", "~~*", etc. One might reflexively propose forcing these expansions to be qualified with pg_catalog, but I'm afraid that would break use-cases involving non-core data types, where the desired operator might well not be in pg_catalog. However, I think that there's not actually any new dump/restore hazard here, because of the fact that ruleutils.c simply prints the results as the internal operator names without trying to convert back to LIKE et al. If the operator isn't in the search path it will get schema-qualified, and all's well. So my inclination in this category is to do nothing. There's certainly a hazard for naive writers of SQL, who may not realize that LIKE entails a search-path-dependent lookup; but that hazard is no worse than it was before we decided to lock down dump/restore search paths. 2. BETWEEN and related constructs are converted to boolean expressions involving unqualified operator names ">=" etc. As above, we are saved by the fact that what will be dumped out is the expanded expression, providing room to schema-qualify the selected operators if needed. So again I'm inclined to leave this alone. (This will be something to think about if we ever get around to re-implementing BETWEEN as a unitary construct, though.) 3. IN and NOT IN are converted to "= ANY" or "<> ALL" with unqualified operator names. This is almost like the above cases, except that ruleutils.c sometimes chooses to print the result using "IN" rather than with the actual operator name. We might have to back off that readability hack in some places (though it seems like it'd be OK if what would be printed is exactly unqualified "= ANY" or "<> ALL"). 4. As per original report, IS [NOT] DISTINCT FROM, as well as NULLIF, are interpreted by doing a binary operator lookup using unqualified "=" as the operator name. My original thought of replacing that by using default operator classes to identify the comparison semantics doesn't really work, because there is no requirement that the two input values be of the same datatype. For example, "int_var IS DISTINCT FROM numeric_var" is handled perfectly well at present, but I see no principled way to figure out what to do with that just by reference to operator classes. We might have little choice but to invent a schema-qualifiable variant syntax so we can represent these constructs unambiguously in dumps. (Ugly as that is, it does have the advantage that we'd not be risking subtle changes in the semantics of these operations.) 5. Row comparison constructs, for example ROW(a,b) < ROW(c,d), are handled by doing column-by-column lookups using the specified operator name; this example devolves to what "a < c" and "b < d" would be interpreted as. Although we support OPERATOR() syntax in these constructs, that doesn't get us far: if the original input was not a schema-qualified operator then it's entirely possible that the per-column operators were found in different schemas, which we can't handle by writing OPERATOR() in the middle. (There's already a comment in ruleutils.c's handling of RowCompareExpr complaining about this.) One idea for resolving that is to extend the OPERATOR syntax to allow multiple operator names for row comparisons, along the lines of ROW(a,b) OPERATOR(pg_catalog.<, public.<) ROW(c,d) This seems probably to be not terribly hard, although back-patching it wouldn't be appetizing. 6. The row comparison problem also manifests for multiple-column cases of ANY_SUBLINK and ALL_SUBLINK, for instance WHERE (a, b) IN (SELECT x, y FROM ...) Again, ruleutils.c contributes to the issue by printing "IN" even though it knows that that isn't 100% accurate. But I think that here again, a possible fix is to allow writing WHERE (a, b) OPERATOR(pg_catalog.=, public.=) ANY (SELECT x, y FROM ...) so that the operators to use can be locked down exactly. 7. The implicit-comparison form of CASE also resolves comparisons by doing lookups with an assumed operator name of "=". For instance CASE x WHEN 4 THEN ... WHEN 5.0 THEN ... END might well choose different comparison operators for the two WHEN clauses. We could imagine printing this as CASE WHEN x = 4 THEN ... WHEN x = 5.0 THEN ... END and schema-qualifying the operators as needed, but that's really totally unsatisfactory if the test expression is expensive or volatile. So I'm not sure what to do about this, but it seems like any real answer will again involve introducing new syntax. So this is all pretty messy, but on the bright side, fixing it would allow cleaning up some ancient squishy coding in ruleutils.c. It wouldn't be controversial as just a v12 addition, perhaps ... but do we have a choice about back-patching? Dump/restore failures are not good. regards, tom lane
Re: cannot restore schema with is not distinct from on hstore sincePG 9.6.8
From
"David G. Johnston"
Date:
So this is all pretty messy, but on the bright side, fixing it would allow
cleaning up some ancient squishy coding in ruleutils.c. It wouldn't be
controversial as just a v12 addition, perhaps ... but do we have a choice
about back-patching? Dump/restore failures are not good.
I think serious consideration needs to be given to ways to allow the user of pg_dump/pg_restore to choose the prior, less secure, mode of operation.
IMO the risk surface presented to support back-patching the behavioral changes was not severe enough to do so in the first place. I'm presuming undoing the back-patch will be shot down without mercy but at least consider an escape hatch for unafflicted secure systems that just happen to depend on search_path more than a super-hardened system would.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > I think serious consideration needs to be given to ways to allow the user > of pg_dump/pg_restore to choose the prior, less secure, mode of operation. > IMO the risk surface presented to support back-patching the behavioral > changes was not severe enough to do so in the first place. I'm presuming > undoing the back-patch will be shot down without mercy but at least > consider an escape hatch for unafflicted secure systems that just happen to > depend on search_path more than a super-hardened system would. FWIW, in the security team's discussions of CVE-2018-1058, I argued strenuously in favor of providing a way to run pg_dump/pg_restore with the system's default search_path as before. I lost the argument; but maybe the need for features like this shows that we are not really ready to insist on unconditional security there. regards, tom lane
Re: cannot restore schema with is not distinct from on hstore sincePG 9.6.8
From
Andrew Dunstan
Date:
On 07/13/2018 05:23 PM, Tom Lane wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: >> I think serious consideration needs to be given to ways to allow the user >> of pg_dump/pg_restore to choose the prior, less secure, mode of operation. >> IMO the risk surface presented to support back-patching the behavioral >> changes was not severe enough to do so in the first place. I'm presuming >> undoing the back-patch will be shot down without mercy but at least >> consider an escape hatch for unafflicted secure systems that just happen to >> depend on search_path more than a super-hardened system would. > FWIW, in the security team's discussions of CVE-2018-1058, I argued > strenuously in favor of providing a way to run pg_dump/pg_restore with > the system's default search_path as before. I lost the argument; > but maybe the need for features like this shows that we are not really > ready to insist on unconditional security there. > > I don't remember that, TBH. Certainly this problem seems nasty enough that we should possibly revisit the issue. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[ just to tie back to this old thread ] I wrote: > I poked into this area for awhile, and it turns out to be even a > worse can of worms than I thought. I looked through gram.y and > parse_expr.c, and identified several distinct classes of issue. > (I'm not promising that I found everything.) In fact, the thread at [1] identifies an essentially similar class of issue that I missed: JOIN ... USING (x) also implicitly looks up an equality operator by seeing what "tab1.x = tab2.x" resolves as. This is much like the CASE situation, in that there's no SQL-standard-compliant way to reverse-list a view containing such a construct while showing how it was resolved. We'd need to invent some new syntax if we want to make this safer. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAC35HNnNGavaZ%3DP%3DrUcwTwYEhfoyXDg32REXCRDgxBmC3No3nA%40mail.gmail.com