Thread: Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
(moving to pgsql-hackers) On 03/06/10 10:37, Heikki Linnakangas wrote: > However, I'm afraid we're lacking in input validation of read-funcs in > general. ... > > Does anyone have an idea on how > to validate the input in a more wholesale fashion, so that we don't need > to plug these holes one by one? Apparently not :-(. We have two options: 1. Make pg_get_expr() handle arbitrary (possibly even malicious) input gracefully. 2. Restrict pg_get_expr() to superusers only. Does anyone want to argue for option 2? We could create views using pg_get_expr() over the internal catalogs that store trees, so that regular users could access those without being able to pass arbitrary input to pg_get_expr(). However, it would break existing applications, at least pgAdmin uses pg_get_expr(). Assuming we want to make pg_get_expr() check its input, we need to: * plug the hole Rushabh reported, and not crash on premature end of string * check all Node types, so that you e.g. can't pass an Integer in a field that's supposed to hold a CaseWhenExpr * similarly, check that all Lists contain elements of the right type. This can all be done in a straightforward way in readfuncs.c, we just need a bit more decoration to all the READ_* macros. However, that's still not enough; the functions in ruleutils.c make a number of other assumptions, like that an OpExpr always has exactly one or two arguments. That kind of assumptions will need to be explicitly checked. Many of them already have Asserts, they need to be turned into elogs. Thoughts? Attached is a patch for the readfuncs.c changes. Unless someone has a better idea, I'll start going through ruleutils.c and add explicit checks for any unsafe assumptions. It's going to be a lot of work, as there's a lot of code in ruleutils.c and the changes need to be backpatched as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > We have two options: > 1. Make pg_get_expr() handle arbitrary (possibly even malicious) input > gracefully. > 2. Restrict pg_get_expr() to superusers only. I think #1 is a fool's errand. There is far too much structure to a node tree that is outside the scope of what readfuncs.c is capable of understanding. regards, tom lane
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 09/06/10 17:34, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> We have two options: > >> 1. Make pg_get_expr() handle arbitrary (possibly even malicious) input >> gracefully. > >> 2. Restrict pg_get_expr() to superusers only. > > I think #1 is a fool's errand. There is far too much structure to a > node tree that is outside the scope of what readfuncs.c is capable of > understanding. That's why I said that ruleutils.c will need to understand and complain about the rest. Are you thinking we should retrict pg_get_expr() to superusers then? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 09/06/10 17:34, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> We have two options: > >> 1. Make pg_get_expr() handle arbitrary (possibly even malicious) input >> gracefully. > >> 2. Restrict pg_get_expr() to superusers only. > > I think #1 is a fool's errand. There is far too much structure to a > node tree that is outside the scope of what readfuncs.c is capable of > understanding. That's why I said that ruleutils.c will need to understand and complain about the rest. Are you thinking we should restrict pg_get_expr() to superusers then? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 09/06/10 17:34, Tom Lane wrote: >> I think #1 is a fool's errand. There is far too much structure to a >> node tree that is outside the scope of what readfuncs.c is capable of >> understanding. > That's why I said that ruleutils.c will need to understand and complain > about the rest. And that's what I'm telling you is a hopeless task. > Are you thinking we should retrict pg_get_expr() to superusers then? I think that's the only solution that will actually fix the problem, rather than lead to a never-ending series of security bugs. In hindsight we should never have exposed that function in that form. regards, tom lane
On Wed, 9 Jun 2010, Heikki Linnakangas wrote: > Are you thinking we should retrict pg_get_expr() to superusers then? > That seems like it will cause problems for both pg_dump and drivers which want to return metadata as pg_get_expr has been the recommended way of fetching this information. The JDBC driver uses pg_get_expr to decode both pg_attrdef.adbin and pg_index.indpred. Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On Wed, 9 Jun 2010, Heikki Linnakangas wrote: >> Are you thinking we should retrict pg_get_expr() to superusers then? > That seems like it will cause problems for both pg_dump and drivers which > want to return metadata as pg_get_expr has been the recommended way of > fetching this information. Yes, it's not a trivial fix either. We'll have to provide functions or views that replace the current usages without letting the user insert untrusted strings. regards, tom lane
On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kris Jurka <books@ejurka.com> writes: >> On Wed, 9 Jun 2010, Heikki Linnakangas wrote: >>> Are you thinking we should retrict pg_get_expr() to superusers then? > >> That seems like it will cause problems for both pg_dump and drivers which >> want to return metadata as pg_get_expr has been the recommended way of >> fetching this information. > > Yes, it's not a trivial fix either. We'll have to provide functions or > views that replace the current usages without letting the user insert > untrusted strings. Maybe I'm all wet here, but don't we need to come up with something we can back-patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yes, it's not a trivial fix either. �We'll have to provide functions or >> views that replace the current usages without letting the user insert >> untrusted strings. > Maybe I'm all wet here, but don't we need to come up with something we > can back-patch? Well, ideally yes, but if it's not actually *secure* then there's no point --- and I don't believe that the approach of making readfuncs.c secure against malicious input has the proverbial snowball's chance of ever being bulletproof. [ thinks for awhile... ] I wonder whether there is any way of locking down pg_get_expr so that it throws an error if called with anything except a suitable field from one of the system catalogs. There are only a few usage patterns that we need to allow, no? At least in recent PG versions it is possible for the function to check that its input expression is a Var. If we had some (probably horridly ugly) way to obtain the rangetable entry the Var refers to, we could put code into pg_get_expr to barf if it's not used in a context like "select pg_get_expr(adbin) from pg_attrdef". regards, tom lane
On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yes, it's not a trivial fix either. We'll have to provide functions or >>> views that replace the current usages without letting the user insert >>> untrusted strings. > >> Maybe I'm all wet here, but don't we need to come up with something we >> can back-patch? > > Well, ideally yes, but if it's not actually *secure* then there's no > point --- and I don't believe that the approach of making readfuncs.c > secure against malicious input has the proverbial snowball's chance > of ever being bulletproof. I don't really see how it could be *impossible* to securely parse text input. It's certainly possible not to crash on trivially malformed input. Completely validating the input MIGHT cost more in performance than we want to pay in CPU cycles, but I guess I'm not seeing why it would be an unsolvable problem apart from that. > [ thinks for awhile... ] I wonder whether there is any way of locking > down pg_get_expr so that it throws an error if called with anything > except a suitable field from one of the system catalogs. There are only > a few usage patterns that we need to allow, no? At least in recent PG > versions it is possible for the function to check that its input > expression is a Var. If we had some (probably horridly ugly) way to > obtain the rangetable entry the Var refers to, we could put code into > pg_get_expr to barf if it's not used in a context like > "select pg_get_expr(adbin) from pg_attrdef". That's sort of clever... in a really ugly sort of way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, ideally yes, but if it's not actually *secure* then there's no >> point --- and I don't believe that the approach of making readfuncs.c >> secure against malicious input has the proverbial snowball's chance >> of ever being bulletproof. > I don't really see how it could be *impossible* to securely parse text > input. It's certainly possible not to crash on trivially malformed > input. The operative word in that claim is "trivial". The problem that I see is that there are many assumptions in the system about the structure and interrelationships of expression node trees, for instance that certain List fields contain only certain node types. I don't believe that it's practical to make the node reading code enforce every one of those assumptions, or that it'd be maintainable if we did manage to get it right to start with. Certainly we can make the node reading code do more checking than it does now, but the odds of making things bulletproof against malicious input are negligible. I don't want to be going back to fix another hole every other month for the lifetime of the project, but that's exactly what we'll be doing if we try to fix it that way. regards, tom lane
I wrote: > [ thinks for awhile... ] I wonder whether there is any way of locking > down pg_get_expr so that it throws an error if called with anything > except a suitable field from one of the system catalogs. I did a bit of research into this idea. It looks at least somewhat feasible: * All PG versions back to 7.4 will pass the calling expression tree via fcinfo->flinfo->fn_expr. Lack of that would be a showstopper, because we can't change the FmgrInfo struct in back branches (ABI break). So we can get the arguments and determine whether they are Vars. * To determine which table a Var actually refers to, we must have access to the rangetable, and in join cases we also need access to the plan tree node containing the Var (since we have to drill down to the plan node's inputs to resolve OUTER and INNER references). The rangetable is reachable from the PlanState node, so it's sufficient to make that one pointer available to functions. The obvious way to handle this is to add a field to FmgrInfo, and I would suggest doing it that way as of 9.0. In the back branches, we could probably hack it without an ABI break by having fn_expr point to some intermediate node type that contains both the actual expression tree and the PlanState pointer (probably, we'd make it point to FuncExprState instead of directly to the FuncExpr, and then add the field to FuncExprState, which is far less dangerous than redefining struct FmgrInfo). Now this depends on the assumption that no external functions are doing anything with fn_expr except passing it to the related fmgr.c routines, which we would teach about this convention. * Once we've got the above data it's a simple matter of adapting the Var-interpretation logic used by EXPLAIN in order to find out the table OID and column number, if any, represented by the Var. I'd suggest adding such functions in fmgr.c to extend the API currently offered by get_fn_expr_argtype() and friends. It seems possible that other functions besides pg_get_expr might have use for such a capability. What I'm suggesting is definitely not a trivial patch, and I would never consider back-patching it if it weren't a security matter. But I think it's doable and we'd be fixing the hole with a determinate amount of work, whereas trying to verify the validity of pg_get_expr input directly would be a never-ending source of more security bugs. Comments? regards, tom lane
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 10/06/10 00:24, Tom Lane wrote: > I wrote: >> [ thinks for awhile... ] I wonder whether there is any way of locking >> down pg_get_expr so that it throws an error if called with anything >> except a suitable field from one of the system catalogs. > > I did a bit of research into this idea. It looks at least somewhat > feasible: > > * All PG versions back to 7.4 will pass the calling expression tree via > fcinfo->flinfo->fn_expr. Lack of that would be a showstopper, because > we can't change the FmgrInfo struct in back branches (ABI break). So we > can get the arguments and determine whether they are Vars. > > * To determine which table a Var actually refers to, we must have access > to the rangetable, and in join cases we also need access to the plan > tree node containing the Var (since we have to drill down to the plan > node's inputs to resolve OUTER and INNER references). The rangetable is > reachable from the PlanState node, so it's sufficient to make that one > pointer available to functions. The obvious way to handle this is to > add a field to FmgrInfo, and I would suggest doing it that way as of > 9.0. In the back branches, we could probably hack it without an ABI > break by having fn_expr point to some intermediate node type that > contains both the actual expression tree and the PlanState pointer > (probably, we'd make it point to FuncExprState instead of directly to > the FuncExpr, and then add the field to FuncExprState, which is far less > dangerous than redefining struct FmgrInfo). Now this depends on the > assumption that no external functions are doing anything with fn_expr > except passing it to the related fmgr.c routines, which we would teach > about this convention. > > * Once we've got the above data it's a simple matter of adapting the > Var-interpretation logic used by EXPLAIN in order to find out the table > OID and column number, if any, represented by the Var. I'd suggest > adding such functions in fmgr.c to extend the API currently offered by > get_fn_expr_argtype() and friends. It seems possible that other > functions besides pg_get_expr might have use for such a capability. > > > What I'm suggesting is definitely not a trivial patch, and I would never > consider back-patching it if it weren't a security matter. But I think > it's doable and we'd be fixing the hole with a determinate amount of > work, whereas trying to verify the validity of pg_get_expr input > directly would be a never-ending source of more security bugs. Yeah, seems like it would work. You could avoid changing the meaning of fn_expr by putting the check in the parse analysis phase, into transformFuncCall(). That would feel safer at least for back-branches. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Florian Pflug
Date:
On Jun 15, 2010, at 9:31 , Heikki Linnakangas wrote: > You could avoid changing the meaning of fn_expr by putting the check in the parse analysis phase, into transformFuncCall().That would feel safer at least for back-branches. For 9.0, wouldn't a cleaner way to accomplish this be a seperate type for expressions, say pg_expr, instead of storing themas text? With an input function that unconditionally raises and error and no cast to pg_expr, creating new instancesof that type would be impossible for normal users. The output function and casts to text would call pg_get_expr()with zero as the second argument. The internal representation wouldn't change, it's just the type's OID in the catalog that'd be different. best regards, Florian Pflug
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 15/06/10 10:31, Heikki Linnakangas wrote: > You could avoid changing the meaning of fn_expr by putting the check in > the parse analysis phase, into transformFuncCall(). That would feel > safer at least for back-branches. Here's a patch using that approach. I grepped through PostgreSQL and pgadmin source code to find the system columns where valid node-strings are stored: pg_index.indexprs pg_index.indprep pg_attrdef.adbin pg_proc.proargdefaults pg_constraint.conbin Am I missing anything? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Mon, Jun 21, 2010 at 7:50 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 15/06/10 10:31, Heikki Linnakangas wrote: >> >> You could avoid changing the meaning of fn_expr by putting the check in >> the parse analysis phase, into transformFuncCall(). That would feel >> safer at least for back-branches. > > Here's a patch using that approach. > > I grepped through PostgreSQL and pgadmin source code to find the system > columns where valid node-strings are stored: > > pg_index.indexprs > pg_index.indprep > pg_attrdef.adbin > pg_proc.proargdefaults > pg_constraint.conbin > > Am I missing anything? I think that pg_type.typdefaultbin is used by pg_dump. pg_rewrite.ev_qual, pg_rewrite.ev_action, pg_trigger.tgqual also contain nodeToString() output but I didn't have any luck using them with pg_get_expr() so maybe they don't need to be included. The only other thing I notice is that, obviously, the FIXME comment needs to be FIXMEd before commit. I'd still be in favor of inserting at least some basic error checks into readfuncs.c, though just in HEAD. The restrictions implemented here seem adequate to prevent a security vulnerability, but superusers can still invoke those functions manually, and while superusers can clearly crash the system in any number of ways, that doesn't seem (to me) like an adequate justification for ignoring the return value of strtok(). YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 15/06/10 15:19, Florian Pflug wrote: > On Jun 15, 2010, at 9:31 , Heikki Linnakangas wrote: >> You could avoid changing the meaning of fn_expr by putting the check in the parse analysis phase, into transformFuncCall().That would feel safer at least for back-branches. > > For 9.0, wouldn't a cleaner way to accomplish this be a seperate type for expressions, say pg_expr, instead of storingthem as text? With an input function that unconditionally raises and error and no cast to pg_expr, creating new instancesof that type would be impossible for normal users. The output function and casts to text would call pg_get_expr()with zero as the second argument. > > The internal representation wouldn't change, it's just the type's OID in the catalog that'd be different. Yeah, that would be quite elegant. I think it's too late for 9.0, but something to consider in the future. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [BUGS] Server crash while trying to read expression using pg_get_expr()
From
Heikki Linnakangas
Date:
On 23/06/10 21:36, Robert Haas wrote: > On Mon, Jun 21, 2010 at 7:50 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 15/06/10 10:31, Heikki Linnakangas wrote: >>> >>> You could avoid changing the meaning of fn_expr by putting the check in >>> the parse analysis phase, into transformFuncCall(). That would feel >>> safer at least for back-branches. >> >> Here's a patch using that approach. >> >> I grepped through PostgreSQL and pgadmin source code to find the system >> columns where valid node-strings are stored: >> >> pg_index.indexprs >> pg_index.indprep >> pg_attrdef.adbin >> pg_proc.proargdefaults >> pg_constraint.conbin >> >> Am I missing anything? > > I think that pg_type.typdefaultbin is used by pg_dump. Yep, added that. > pg_rewrite.ev_qual, pg_rewrite.ev_action, pg_trigger.tgqual also > contain nodeToString() output but I didn't have any luck using them > with pg_get_expr() so maybe they don't need to be included. I left them out. > The only other thing I notice is that, obviously, the FIXME comment > needs to be FIXMEd before commit. Fixed. > I'd still be in favor of inserting at least some basic error checks > into readfuncs.c, though just in HEAD. The restrictions implemented > here seem adequate to prevent a security vulnerability, but superusers > can still invoke those functions manually, and while superusers can > clearly crash the system in any number of ways, that doesn't seem (to > me) like an adequate justification for ignoring the return value of > strtok(). YMMV, of course. Agreed. I'll do that as a separate patch. Thanks for the review! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com