Re: Is MinMaxExpr really leakproof? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Is MinMaxExpr really leakproof?
Date
Msg-id 20190102205950.GN2528@tamriel.snowman.net
Whole thread Raw
In response to Re: Is MinMaxExpr really leakproof?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Is MinMaxExpr really leakproof?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> Noah Misch <noah@leadboat.com> writes:
> >>> Either of those solutions sounds fine.  Like last time, I'll vote for explicitly
> >>> verifying leakproofness.
>
> >> Yeah, I'm leaning in that direction as well.  Other than comparisons
> >> involving strings, it's not clear that we'd gain much from insisting
> >> on leakproofness in general, and it seems like it might be rather a
> >> large burden to put on add-on data types.
>
> > While I'd actually like it if we required leakproofness for what we
> > ship, I agree that we shouldn't blindly assume that add-on data types
> > are always leakproof and that then requires that we explicitly verify
> > it.  Perhaps an argument can be made that there are some cases where
> > what we ship can't or shouldn't be leakproof for usability but, ideally,
> > those would be relatively rare exceptions that don't impact common
> > use-cases.
>
> Seems like we have consensus that MinMaxExpr should verify leakproofness
> rather than just assume it, so I'll go fix that.
>
> What's your opinion on the question of whether to try to make text_cmp
> et al leakproof?  I notice that texteq/textne are (correctly) marked
> leakproof, so perhaps the usability issue isn't as pressing as I first
> thought; but it remains true that there are fairly common cases where
> the current marking is going to impede optimization.  I also realized
> that in the wake of 586b98fdf, we have to remove the leakproofness
> marking of name_cmp and name inequality comparisons --- which I did
> at d01e75d68, but that's potentially a regression in optimizability
> of catalog queries, so it's not very nice.

Well, as mentioned, I'd really be happier to have more things be
leakproof, when they really are leakproof.  What we've done in some
places, and I'm not sure how practical this is elsewhere, is to show
data when we know the user is allowed to see it anyway, to aide in
debugging and such (I'm thinking here specifically of
BuildIndexValueDescription(), which will just return a NULL in the case
where the user doesn't have permission to view all of the columns
involved).  As these are error cases, we're generally happy to consider
spending a bit of extra time to figure that out, is there something
similar we could do for these cases where we'd really like to report
useful information to the user, but only if we think they're probably
allowed to see it?

> Also, I believe that Peter's work towards making text equality potentially
> collation-sensitive will destroy the excuse for marking texteq/textne
> leakproof if we're going to be 100% rigid about that, and that would be
> a very big regression.

That could be a serious problem, I agree.

> So I'd like to get to a point where we're comfortable marking these
> functions leakproof despite the possibility of corner-case failures.
> We could just decide that the existing failure cases in varstr_cmp are
> not usefully exploitable for information leakage, or perhaps we could
> dumb down the error messages some more to make them even less so.
> It'd also be nice to have some articulatable policy that encompasses
> a choice like this.

I'd rather not say "well, these are mostly leakproof and therefore it's
good enough" unless those corner-case failures you're referring to are
really "this system call isn't documented to ever fail in a way we can't
handle, but somehow it did and we're blowing up because of it."

At least, in the cases where we're actually leaking knowledge that we
shouldn't be.  If what we're leaking is some error being returned where
all we're returning is an error code and not the actual data then that
doesn't seem like it's really much of a leak to me..?  I'm just glancing
through varstr_cmp and perhaps I'm missing something but it seems like
everywhere we're returning an error, at least from there, it's an error
code of some kind being returned and not the data that was passed in to
the function.  I didn't spend a lot of time hunting through it though.

Thanks!

Stephen

Attachment

pgsql-hackers by date:

Previous
From: James Robinson
Date:
Subject: Arrays of domain returned to client as non-builtin oid describing thearray, not the base array type's oid
Next
From: Peter Eisentraut
Date:
Subject: Re: log bind parameter values on error