Thread: Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 07/09/2018 11:34 AM, Tom Lane wrote:
>> I think the most practical way to deal with this probably is to change
>> the parser so that the lookup works by finding a default btree or hash
>> opclass rather than by looking for "=" by name.  We've made similar
>> changes in the past to get rid of implicit dependencies on operator
>> names, but those efforts never reached IS [NOT] DISTINCT FROM.

> I agree with your approach, including backpatching. I guess we'll have
> to try to think of some scenario that backpatching would break. Maybe to
> minimize any effect it should fall back on a default opclass if the
> search for "=" fails? Dunno how practical that is.

I poked into this area for awhile, and it turns out to be even a
worse can of worms than I thought.  I looked through gram.y and
parse_expr.c, and identified several distinct classes of issue.
(I'm not promising that I found everything.)

1. LIKE, ILIKE, SIMILAR TO are converted to unqualified operator names
"~~", "~~*", etc.  One might reflexively propose forcing these expansions
to be qualified with pg_catalog, but I'm afraid that would break use-cases
involving non-core data types, where the desired operator might well not
be in pg_catalog.  However, I think that there's not actually any new
dump/restore hazard here, because of the fact that ruleutils.c simply
prints the results as the internal operator names without trying to
convert back to LIKE et al.  If the operator isn't in the search path
it will get schema-qualified, and all's well.  So my inclination in this
category is to do nothing.  There's certainly a hazard for naive writers
of SQL, who may not realize that LIKE entails a search-path-dependent
lookup; but that hazard is no worse than it was before we decided to
lock down dump/restore search paths.

2. BETWEEN and related constructs are converted to boolean expressions
involving unqualified operator names ">=" etc.  As above, we are saved
by the fact that what will be dumped out is the expanded expression,
providing room to schema-qualify the selected operators if needed.
So again I'm inclined to leave this alone.  (This will be something
to think about if we ever get around to re-implementing BETWEEN as a
unitary construct, though.)

3. IN and NOT IN are converted to "= ANY" or "<> ALL" with unqualified
operator names.  This is almost like the above cases, except that
ruleutils.c sometimes chooses to print the result using "IN" rather than
with the actual operator name.  We might have to back off that readability
hack in some places (though it seems like it'd be OK if what would be
printed is exactly unqualified "= ANY" or "<> ALL").

4. As per original report, IS [NOT] DISTINCT FROM, as well as NULLIF,
are interpreted by doing a binary operator lookup using unqualified "="
as the operator name.  My original thought of replacing that by using
default operator classes to identify the comparison semantics doesn't
really work, because there is no requirement that the two input values be
of the same datatype.  For example, "int_var IS DISTINCT FROM numeric_var"
is handled perfectly well at present, but I see no principled way to
figure out what to do with that just by reference to operator classes.
We might have little choice but to invent a schema-qualifiable variant
syntax so we can represent these constructs unambiguously in dumps.
(Ugly as that is, it does have the advantage that we'd not be risking
subtle changes in the semantics of these operations.)

5. Row comparison constructs, for example ROW(a,b) < ROW(c,d), are handled
by doing column-by-column lookups using the specified operator name; this
example devolves to what "a < c" and "b < d" would be interpreted as.
Although we support OPERATOR() syntax in these constructs, that doesn't
get us far: if the original input was not a schema-qualified operator
then it's entirely possible that the per-column operators were found in
different schemas, which we can't handle by writing OPERATOR() in the
middle.  (There's already a comment in ruleutils.c's handling of
RowCompareExpr complaining about this.)

One idea for resolving that is to extend the OPERATOR syntax to allow
multiple operator names for row comparisons, along the lines of
    ROW(a,b) OPERATOR(pg_catalog.<, public.<) ROW(c,d)
This seems probably to be not terribly hard, although back-patching it
wouldn't be appetizing.

6. The row comparison problem also manifests for multiple-column cases
of ANY_SUBLINK and ALL_SUBLINK, for instance
    WHERE (a, b) IN (SELECT x, y FROM ...)
Again, ruleutils.c contributes to the issue by printing "IN" even
though it knows that that isn't 100% accurate.  But I think that here
again, a possible fix is to allow writing
    WHERE (a, b) OPERATOR(pg_catalog.=, public.=) ANY (SELECT x, y FROM ...)
so that the operators to use can be locked down exactly.

7. The implicit-comparison form of CASE also resolves comparisons by
doing lookups with an assumed operator name of "=".  For instance
      CASE x WHEN 4 THEN ... WHEN 5.0 THEN ... END
might well choose different comparison operators for the two WHEN
clauses.  We could imagine printing this as
      CASE WHEN x = 4 THEN ... WHEN x = 5.0 THEN ... END
and schema-qualifying the operators as needed, but that's really
totally unsatisfactory if the test expression is expensive or volatile.
So I'm not sure what to do about this, but it seems like any real
answer will again involve introducing new syntax.


So this is all pretty messy, but on the bright side, fixing it would allow
cleaning up some ancient squishy coding in ruleutils.c.  It wouldn't be
controversial as just a v12 addition, perhaps ... but do we have a choice
about back-patching?  Dump/restore failures are not good.

            regards, tom lane


Re: cannot restore schema with is not distinct from on hstore sincePG 9.6.8

From
"David G. Johnston"
Date:
On Fri, Jul 13, 2018 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So this is all pretty messy, but on the bright side, fixing it would allow
cleaning up some ancient squishy coding in ruleutils.c.  It wouldn't be
controversial as just a v12 addition, perhaps ... but do we have a choice
about back-patching?  Dump/restore failures are not good.

I think serious consideration needs to be given to ways to allow the user of pg_dump/pg_restore to choose the prior, less secure, mode of operation​.

IMO the risk surface presented to support back-patching the behavioral changes was not severe enough to do so in the first place.  I'm presuming undoing the back-patch will be shot down without mercy but at least consider an escape hatch for unafflicted secure systems that just happen to depend on search_path more than a super-hardened system would.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I think serious consideration needs to be given to ways to allow the user
> of pg_dump/pg_restore to choose the prior, less secure, mode of operation​.
> IMO the risk surface presented to support back-patching the behavioral
> changes was not severe enough to do so in the first place.  I'm presuming
> undoing the back-patch will be shot down without mercy but at least
> consider an escape hatch for unafflicted secure systems that just happen to
> depend on search_path more than a super-hardened system would.

FWIW, in the security team's discussions of CVE-2018-1058, I argued
strenuously in favor of providing a way to run pg_dump/pg_restore with
the system's default search_path as before.  I lost the argument;
but maybe the need for features like this shows that we are not really
ready to insist on unconditional security there.

            regards, tom lane


Re: cannot restore schema with is not distinct from on hstore sincePG 9.6.8

From
Andrew Dunstan
Date:

On 07/13/2018 05:23 PM, Tom Lane wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> I think serious consideration needs to be given to ways to allow the user
>> of pg_dump/pg_restore to choose the prior, less secure, mode of operation​.
>> IMO the risk surface presented to support back-patching the behavioral
>> changes was not severe enough to do so in the first place.  I'm presuming
>> undoing the back-patch will be shot down without mercy but at least
>> consider an escape hatch for unafflicted secure systems that just happen to
>> depend on search_path more than a super-hardened system would.
> FWIW, in the security team's discussions of CVE-2018-1058, I argued
> strenuously in favor of providing a way to run pg_dump/pg_restore with
> the system's default search_path as before.  I lost the argument;
> but maybe the need for features like this shows that we are not really
> ready to insist on unconditional security there.
>
>             

I don't remember that, TBH.

Certainly this problem seems nasty enough that we should possibly 
revisit the issue.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



[ just to tie back to this old thread ]

I wrote:
> I poked into this area for awhile, and it turns out to be even a
> worse can of worms than I thought.  I looked through gram.y and
> parse_expr.c, and identified several distinct classes of issue.
> (I'm not promising that I found everything.)

In fact, the thread at [1] identifies an essentially similar class
of issue that I missed: JOIN ... USING (x) also implicitly looks up
an equality operator by seeing what "tab1.x = tab2.x" resolves as.
This is much like the CASE situation, in that there's no
SQL-standard-compliant way to reverse-list a view containing
such a construct while showing how it was resolved.  We'd need
to invent some new syntax if we want to make this safer.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAC35HNnNGavaZ%3DP%3DrUcwTwYEhfoyXDg32REXCRDgxBmC3No3nA%40mail.gmail.com