Thread: Sloppy thinking about leakproof properties of opclass co-members
It strikes me that there's a significant gap in the whole "leakproof function" business, namely that no consideration has been given to planner-driven transformations of queries. As an example, if we have "a = b" and "b = c", the planner may generate and apply "a = c" instead of one or both of those clauses. If a, b, c are not all the same type, "a = c" might involve an operator that's not either of the original ones. And it's possible that that operator is not leakproof where the original ones are. This could easily result in introducing non-leakproof operations into a secure subquery after pushdown of a clause that was marked secure. Another example is that in attempting to make implication or refutation proofs involving operator clauses, the planner feels free to apply other members of the operator's btree opclass (if it's in one). I've not bothered to try to create a working exploit, but I'm pretty sure that this could result in a non-leakproof function being applied during planning of a supposedly secure subquery. It might be that that couldn't leak anything worse than constant values within the query tree, but perhaps it could leak data values from a protected table's pg_statistic entries. ISTM that the most appropriate solution here is to insist that all or none of the members of an operator class be marked leakproof. (Possibly we could restrict that to btree opclasses, but I'm not sure any exception is needed for other index types.) I looked for existing violations of this precept, and unfortunately found a *lot*. For example, texteq is marked leakproof but its fellow text comparison operators aren't. Is that really sane? Here's a query to find problematic cases: select p1.proname,o1.oprname,p2.proname,o2.oprname,a1.amopfamily from pg_proc p1, pg_operator o1, pg_amop a1, pg_proc p2, pg_operator o2, pg_amop a2 where p1.oid = o1.oprcode and p2.oid = o2.oprcode and o1.oid = a1.amopopr and o2.oid = a2.amopopr and a1.amopfamily = a2.amopfamily and p1.proleakproof and not p2.proleakproof; Oh ... and just to add insult to injury, the underlying opclass support functions (such as textcmp and hashtext) are generally not marked leakproof even when the operators they may be executed instead of are. This is even more silly. regards, tom lane
On Wed, Sep 24, 2014 at 5:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It strikes me that there's a significant gap in the whole "leakproof > function" business, namely that no consideration has been given to > planner-driven transformations of queries. As an example, if we > have "a = b" and "b = c", the planner may generate and apply "a = c" > instead of one or both of those clauses. If a, b, c are not all the > same type, "a = c" might involve an operator that's not either of the > original ones. And it's possible that that operator is not leakproof > where the original ones are. This could easily result in introducing > non-leakproof operations into a secure subquery after pushdown of a > clause that was marked secure. > > Another example is that in attempting to make implication or refutation > proofs involving operator clauses, the planner feels free to apply other > members of the operator's btree opclass (if it's in one). I've not > bothered to try to create a working exploit, but I'm pretty sure that > this could result in a non-leakproof function being applied during > planning of a supposedly secure subquery. It might be that that couldn't > leak anything worse than constant values within the query tree, but > perhaps it could leak data values from a protected table's pg_statistic > entries. > > ISTM that the most appropriate solution here is to insist that all or none > of the members of an operator class be marked leakproof. (Possibly we > could restrict that to btree opclasses, but I'm not sure any exception is > needed for other index types.) I looked for existing violations of this > precept, and unfortunately found a *lot*. For example, texteq is marked > leakproof but its fellow text comparison operators aren't. Is that really > sane? Not really. Fortunately, AFAICT, most if not all of these are in the good direction: there are some things not marked leakproof that can be so marked. The reverse direction would be a hard-to-fix security hole. I think at some point somebody went through and tried to mark all of the same-type equality operators as leakproof, and it seems like that got expanded somewhat without fully rationalizing what we had in pg_proc... mostly because I think nobody had a clear idea how to do that. I think your proposal here is a good one. Heikki proposed treating opclass functions as leakproof *in lieu of* adding a flag, but that didn't seem good because we wanted to allow for the possibility of cases where that wasn't true, and ensure individual scrutiny of each case. Your idea of making sure the flag is set consistently throughout the opclass (opfamily?) is similar in spirit, but better in detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Sep 24, 2014 at 5:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > ISTM that the most appropriate solution here is to insist that all or none > > of the members of an operator class be marked leakproof. (Possibly we > > could restrict that to btree opclasses, but I'm not sure any exception is > > needed for other index types.) I looked for existing violations of this > > precept, and unfortunately found a *lot*. For example, texteq is marked > > leakproof but its fellow text comparison operators aren't. Is that really > > sane? > > Not really. Fortunately, AFAICT, most if not all of these are in the > good direction: there are some things not marked leakproof that can be > so marked. The reverse direction would be a hard-to-fix security > hole. I think at some point somebody went through and tried to mark > all of the same-type equality operators as leakproof, and it seems > like that got expanded somewhat without fully rationalizing what we > had in pg_proc... mostly because I think nobody had a clear idea how > to do that. We'll need to investigate and see if there are any which *aren't* safe and probably fix those to be safe rather than trying to deal with this risk in some other way. In other words, I hope it's really "all of these" rather than just most. In general, I've been hoping that we have more leakproof functions than not as, while it's non-trivial to write them and ensure they're correct, it's better for us to take that on than to ask our users to do so (or have them get some set that someone else wrote off of a website or even the extension network). We can't cover everything, of course, but hopefully we'll cover all reasonable use cases for types we ship. > I think your proposal here is a good one. Heikki proposed treating > opclass functions as leakproof *in lieu of* adding a flag, but that > didn't seem good because we wanted to allow for the possibility of > cases where that wasn't true, and ensure individual scrutiny of each > case. Your idea of making sure the flag is set consistently > throughout the opclass (opfamily?) is similar in spirit, but better in > detail. Agreed- a regression test here is definitely needed and have any exceptions which must exist, after we've determined that they don't produce an actual security hole (not sure how they wouldn't, but still) explicitly called out in the regression tests, to avoid individuals missing this requirement by copying an existing pg_proc example and thinking they can add a new opclass item (or change an existing one) to not be leakproof when the rest is. Thanks, Stephen
On 26 September 2014 15:48, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Wed, Sep 24, 2014 at 5:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > ISTM that the most appropriate solution here is to insist that all or none >> > of the members of an operator class be marked leakproof. (Possibly we >> > could restrict that to btree opclasses, but I'm not sure any exception is >> > needed for other index types.) I looked for existing violations of this >> > precept, and unfortunately found a *lot*. For example, texteq is marked >> > leakproof but its fellow text comparison operators aren't. Is that really >> > sane? >> >> Not really. Fortunately, AFAICT, most if not all of these are in the >> good direction: there are some things not marked leakproof that can be >> so marked. The reverse direction would be a hard-to-fix security >> hole. I think at some point somebody went through and tried to mark >> all of the same-type equality operators as leakproof, and it seems >> like that got expanded somewhat without fully rationalizing what we >> had in pg_proc... mostly because I think nobody had a clear idea how >> to do that. > > We'll need to investigate and see if there are any which *aren't* safe > and probably fix those to be safe rather than trying to deal with this > risk in some other way. In other words, I hope it's really "all of > these" rather than just most. In general, I've been hoping that we have > more leakproof functions than not as, while it's non-trivial to write > them and ensure they're correct, it's better for us to take that on than > to ask our users to do so (or have them get some set that someone else > wrote off of a website or even the extension network). We can't cover > everything, of course, but hopefully we'll cover all reasonable use > cases for types we ship. > Looking at other functions, ISTM that there's an entire class of functions that can trivially be marked leakproof, and that's no-arg functions which can't possibly leak. There are quite a few no-arg builtin functions and none of them are currently marked leakproof. Rather than (or perhaps as well as) marking all these leakproof, perhaps we should teach contain_leaky_functions() to automatically treat any no-arg function as leakproof, so that we allow user-defined functions too. Taking that one step further, perhaps what it should really be looking for is Vars in the argument list of a leaky function, i.e., contain_leaked_vars() rather than contain_leaky_functions(). Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Rather than (or perhaps as well as) marking all these leakproof, > perhaps we should teach contain_leaky_functions() to automatically > treat any no-arg function as leakproof, so that we allow user-defined > functions too. Taking that one step further, perhaps what it should > really be looking for is Vars in the argument list of a leaky > function, i.e., contain_leaked_vars() rather than > contain_leaky_functions(). +1, but that's a totally independent question from what I'm on about at the moment. regards, tom lane