Thread: Facility for detecting insecure object naming
Commit e09144e documented the rules for writing safe qualified names, but those rules are tedious to apply in practice. Spotting the defects in this function definition (from an unpublished draft intended for https://postgr.es/m/20180710014308.GA805781@rfd.leadboat.com) is, I think, too hard: CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT CASE WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./) @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8 WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./) @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8 ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./) @extschema@.earth())) END'; If hackers and non-core extension authors are to write such code, let's make it easier to check the work. Different classes of code need different checks. In each class, qualified function and operator names referring to untrusted schemas need an exact match of function parameters, including any VARIADIC. Class-specific rules: a. SQL intended to run under secure search_path. No class-specific rules. src/bin code is an example of this class, and this is the only secure class for end-user applications. b. SQL intended for a particular search_path, possibly untrusted. Unqualified names need an exact match. Use a qualified name for any object whose schema appears in search_path later than some untrusted schema. Examples of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some functions with "SET search_path", and many casual end-user applications. c. SQL intended to work the same under every search_path setting. Do not use any unqualified name. Most pg_catalog and contrib functions, but not those having a "SET search_path" annotation, are examples of this class. I believe PostgreSQL can apply each class's rules given a list of trusted schemas and a flag to enable the checks. Class (b) naturally degenerates to class (a) if every schema of search_path appears in the whitelist. To check class-(c) code, issue "SET search_path = not_in_whitelist, pg_temp, pg_catalog, ..." before the test queries. That's something of a hack, but I didn't think of a non-hack that I liked better. Should this feature warn about "SELECT 'earth()'::pg_catalog.regprocedure" under the conditions that would make it warn about "SELECT earth()"? Should it offer the option to warn or not warn? Some uses of reg*, e.g. appendQualifiedRelation(), would consider those to be false positives. Then there's the question of exact UI naming. Some possibilities: SET lint_trusted_schemas = pg_catalog, admin SET lint = reg*, exact_match, qualified_name SET lint = all SET lint = '' SET lint_trusted_schemas = pg_catalog, admin SET lint_name_security = on SET name_security_warning_trusted_schemas = pg_catalog, admin SET name_security_warning = on SET name_security_warnings_trusted_schemas = pg_catalog, admin SET warnings = reg*, exact_match, qualified_name Preferences, other ideas?
On Sun, Aug 5, 2018 at 1:34 PM, Noah Misch <noah@leadboat.com> wrote: > If hackers and non-core extension authors are to write such code, let's make > it easier to check the work. +1. Better still would be to invent a way to remove the need for such onerous qualification, but I don't have a good idea. > a. SQL intended to run under secure search_path. No class-specific rules. > src/bin code is an example of this class, and this is the only secure class > for end-user applications. > > b. SQL intended for a particular search_path, possibly untrusted. Unqualified > names need an exact match. Use a qualified name for any object whose > schema appears in search_path later than some untrusted schema. Examples > of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some > functions with "SET search_path", and many casual end-user applications. > > c. SQL intended to work the same under every search_path setting. Do not use > any unqualified name. Most pg_catalog and contrib functions, but not those > having a "SET search_path" annotation, are examples of this class. If I understand correctly, the only difference between (b) and (c) is that references to an object in the very first schema in the search path could be unqualified; in most cases, that would be pg_catalog. So I guess what we need is a UI that lets you say either: - Don't tell me about anything, or - Tell me about all unqualified references, or - Tell me about all unqualified references except those that latch onto an object in <list of schemas> Personally, I'm not entirely sold on having that third capability. I guess it's valuable, but the second one seems like the much more valuable thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 08, 2018 at 12:00:45PM +0530, Robert Haas wrote: > On Sun, Aug 5, 2018 at 1:34 PM, Noah Misch <noah@leadboat.com> wrote: > > If hackers and non-core extension authors are to write such code, let's make > > it easier to check the work. > > +1. Better still would be to invent a way to remove the need for such > onerous qualification, but I don't have a good idea. Agreed. Thanks for thinking about it. > > a. SQL intended to run under secure search_path. No class-specific rules. > > src/bin code is an example of this class, and this is the only secure class > > for end-user applications. > > > > b. SQL intended for a particular search_path, possibly untrusted. Unqualified > > names need an exact match. Use a qualified name for any object whose > > schema appears in search_path later than some untrusted schema. Examples > > of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some > > functions with "SET search_path", and many casual end-user applications. > > > > c. SQL intended to work the same under every search_path setting. Do not use > > any unqualified name. Most pg_catalog and contrib functions, but not those > > having a "SET search_path" annotation, are examples of this class. > > If I understand correctly, the only difference between (b) and (c) is > that references to an object in the very first schema in the search > path could be unqualified; in most cases, that would be pg_catalog. In (b), there's no bound on the number of schemas for which qualification is optional. Consider search_path=pg_temp,pg_catalog,admin,"$user",public with you trusting each of those schemas except "public". The rules of (b) then do not require any qualified names, though unqualified names shall arrange an exact match of argument types. On the other hand, with search_path=public,equally_public, one shall qualify all names of objects located in equally_public. > So I guess what we need is a UI that lets you say either: > > - Don't tell me about anything, or > - Tell me about all unqualified references, or > - Tell me about all unqualified references except those that latch > onto an object in <list of schemas> "SELECT exp(1.0)" merits a warning under search_path=public,pg_catalog but not under search_path=pg_catalog,public. Thus, it's more subtle than your last bullet. That's why I envisioned the user declaring which schemas to trust. The system uses that and the actual search_path to choose warnings. > Personally, I'm not entirely sold on having that third capability. I > guess it's valuable, but the second one seems like the much more > valuable thing. That's fair. Even without pursuing that, it's valuable to know the list of trusted schemas so we can relax about exact argument type matches when the function is in a trusted schema. For example, pg_catalog.exp(1) is safe because no hostile user can create pg_catalog.exp(int4).
Robert Haas <robertmhaas@gmail.com> writes: > +1. Better still would be to invent a way to remove the need for such > onerous qualification, but I don't have a good idea. Yeah. > So I guess what we need is a UI that lets you say either: > - Don't tell me about anything, or > - Tell me about all unqualified references, or > - Tell me about all unqualified references except those that latch > onto an object in <list of schemas> > Personally, I'm not entirely sold on having that third capability. I > guess it's valuable, but the second one seems like the much more > valuable thing. I'm not sold on #2 either. That path leads to, for example, s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic to both readability and portability of your SQL code. We *must* find a way to do better, not tell people that's what to do. When the security team was discussing this issue before, we speculated about ideas like inventing a function trust mechanism, so that attacks based on search path manipulations would fail even if they managed to capture an operator reference. I'd rather go down that path than encourage people to do more schema qualification. regards, tom lane
On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
I must be missing something. Aren't search_path manipulation problems avoided by using "SET search_path FROM CURRENT"?
While I'm asking, does anybody know why this isn't the default, especially for SECURITY DEFINER functions? It seems like in addition to being a more secure default, it would be better for JIT compilation - right now it seems you need to re-compile whenever the function is called with a different search_path. The ability for a function's meaning to change dramatically depending on the caller's search_path seems like an occasionally-useful extra, not what one would expect as the default.
Isaac Morland <isaac.morland@gmail.com> writes: > On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> When the security team was discussing this issue before, we speculated >> about ideas like inventing a function trust mechanism, so that attacks >> based on search path manipulations would fail even if they managed to >> capture an operator reference. I'd rather go down that path than >> encourage people to do more schema qualification. > I must be missing something. Aren't search_path manipulation problems > avoided by using "SET search_path FROM CURRENT"? No, not if you have any publicly-writable schemas anywhere in your search path. The reason this business is so nasty is that our historical default ("$user, public", with pg_catalog implicitly searched before that) is actually insecure, even for references intended to go to pg_catalog. There are too many cases where the ambiguous-operator resolution rules will allow a reference to be captured by a similarly-named operator later in the search path. If you're using a secure search_path to start with, it's possible that decorating your functions with SET search_path FROM CURRENT would be helpful, but it's not like that's some kind of magic bullet. > While I'm asking, does anybody know why this isn't the default, especially > for SECURITY DEFINER functions? It might fix some subset of security issues, but I think that changing the default behavior like that would break a bunch of other use-cases. It'd be especially surprising for such a thing to apply only to SECURITY DEFINER functions. The bigger picture here is that it's really hard to fix this class of problems by trying to dictate changes in the way people have their search paths set up. You're much more likely to break working applications than help anybody. I'm still of the opinion that we're going to be forced to provide loopholes in what we did to pg_dump et al in CVE-2018-1058. We've been seeing an increasing number of complaints about that since the patches came out, and I'm pretty sure that most of the complainers are totally uninterested in defending against share-my-database-with-a-hostile-user scenarios. Why should they have to complicate their lives because of problems that other people might have? The advantage of a function trust mechanism is that it'd provide security against calling functions you didn't intend to without any visible changes in normal application behavior. The security team gave up on that approach because it seemed too complicated to pursue as a secretly-developed security patch, but I still think it's the right long-term answer. regards, tom lane
> On Aug 8, 2018, at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Isaac Morland <isaac.morland@gmail.com> writes: >> On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> When the security team was discussing this issue before, we speculated >>> about ideas like inventing a function trust mechanism, so that attacks >>> based on search path manipulations would fail even if they managed to >>> capture an operator reference. I'd rather go down that path than >>> encourage people to do more schema qualification. > >> I must be missing something. Aren't search_path manipulation problems >> avoided by using "SET search_path FROM CURRENT"? > > No, not if you have any publicly-writable schemas anywhere in your > search path. The reason this business is so nasty is that our > historical default ("$user, public", with pg_catalog implicitly > searched before that) is actually insecure, even for references > intended to go to pg_catalog. There are too many cases where the > ambiguous-operator resolution rules will allow a reference to be > captured by a similarly-named operator later in the search path. > > If you're using a secure search_path to start with, it's possible > that decorating your functions with SET search_path FROM CURRENT > would be helpful, but it's not like that's some kind of magic bullet. > >> While I'm asking, does anybody know why this isn't the default, especially >> for SECURITY DEFINER functions? > > It might fix some subset of security issues, but I think that changing > the default behavior like that would break a bunch of other use-cases. > It'd be especially surprising for such a thing to apply only to > SECURITY DEFINER functions. > > The bigger picture here is that it's really hard to fix this class > of problems by trying to dictate changes in the way people have their > search paths set up. You're much more likely to break working > applications than help anybody. I'm still of the opinion that we're > going to be forced to provide loopholes in what we did to pg_dump > et al in CVE-2018-1058. We've been seeing an increasing number > of complaints about that since the patches came out, and I'm pretty > sure that most of the complainers are totally uninterested in > defending against share-my-database-with-a-hostile-user scenarios. > Why should they have to complicate their lives because of problems > that other people might have? > > The advantage of a function trust mechanism is that it'd provide > security against calling functions you didn't intend to without > any visible changes in normal application behavior. The security > team gave up on that approach because it seemed too complicated to > pursue as a secretly-developed security patch, but I still think > it's the right long-term answer. Do you have a WIP patch partially developed for this? If it is no longer secret, perhaps the rest of us could take a look? mark
Mark Dilger <hornschnorter@gmail.com> writes: > On Aug 8, 2018, at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The advantage of a function trust mechanism is that it'd provide >> security against calling functions you didn't intend to without >> any visible changes in normal application behavior. The security >> team gave up on that approach because it seemed too complicated to >> pursue as a secretly-developed security patch, but I still think >> it's the right long-term answer. > Do you have a WIP patch partially developed for this? If it is no > longer secret, perhaps the rest of us could take a look? Yeah, I do have a POC prototype, let me blow the dust off it ... regards, tom lane
On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote: > I'm not sold on #2 either. That path leads to, for example, > s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic > to both readability and portability of your SQL code. We *must* find > a way to do better, not tell people that's what to do. > > When the security team was discussing this issue before, we speculated > about ideas like inventing a function trust mechanism, so that attacks > based on search path manipulations would fail even if they managed to > capture an operator reference. I'd rather go down that path than > encourage people to do more schema qualification. Interesting. If we got a function trust mechanism, how much qualification would you then like? Here are the levels I know about, along with their implications: -- (1) Use qualified references and exact match for all objects. -- -- Always secure, even if schema usage does not conform to ddl-schemas-patterns -- and function trust is disabled. -- -- Subject to denial of service from anyone able to CREATE in cube schema or -- earthdistance schema. CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS $$SELECT CASE WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8 WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8 ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord( $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth())) END$$; -- (2) Use qualified references for objects outside pg_catalog. -- -- With function trust disabled, this would be subject to privilege escalation -- from anyone able to CREATE in cube schema. -- -- Subject to denial of service from anyone able to CREATE in cube schema or -- earthdistance schema. CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS $$SELECT CASE WHEN @cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth() < -1 THEN -90::float8 WHEN @cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth() > 1 THEN 90::float8 ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth())) END$$; -- (3) "SET search_path" with today's code. -- -- Security and reliability considerations are the same as (2). Today, this -- reduces performance by suppressing optimizations like inlining. CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE SET search_path FROM CURRENT AS $$SELECT CASE WHEN cube_ll_coord($1, 3) / earth() < -1 THEN -90::float8 WHEN cube_ll_coord($1, 3) / earth() > 1 THEN 90::float8 ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) END$$; -- (4) Today's code (reformatted). -- -- Always secure if schema usage conforms to ddl-schemas-patterns, even if -- function trust is disabled. If cube schema or earthdistance schema is not in -- search_path, function doesn't work. CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS $$SELECT CASE WHEN cube_ll_coord($1, 3) / earth() < -1 THEN -90::float8 WHEN cube_ll_coord($1, 3) / earth() > 1 THEN 90::float8 ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) END$$;
On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > -- (3) "SET search_path" with today's code. > -- > -- Security and reliability considerations are the same as (2). Today, this > -- reduces performance by suppressing optimizations like inlining. Out of curiosity, why does this suppress inlining? Anyways, my preference would be to have syntax by which to say: resolve at declaration time using the then-in-effect search_path and store as-qualified. This could just be SET search_path without an assignment. CREATE FUNCTION ... AS $$ ... $$ SET search_path; Another possibility would be to have a way to set a search_path for all expressions in a given schema, something like: SET SCHEMA my_schema DEFAULT search_path = ...; which would apply to all expressions in schema elements in schema "my_schema": - CHECK expressions - INDEX expressions - VIEWs and MATERIALIZED VIEWs - FUNCTION and STORED PROCEDURE bodies - ... CREATE SCHEMA IF NOT EXISTS my_schema; SET SCHEMA my_schema DEFAULT search_path = my_schema, my_other_schema; CREATE OR REPLACE FUNCTION foo() ... AS $$ ... $$; ... Nico --
On Wed, Aug 08, 2018 at 10:47:04AM -0400, Tom Lane wrote: > Isaac Morland <isaac.morland@gmail.com> writes: > > While I'm asking, does anybody know why this isn't the default, especially > > for SECURITY DEFINER functions? > > It might fix some subset of security issues, but I think that changing > the default behavior like that would break a bunch of other use-cases. > It'd be especially surprising for such a thing to apply only to > SECURITY DEFINER functions. Some projects consider breaking backwards compatibility to fix security problems (within reason, and with discussion) to be a fair thing to do. Already people have to qualify their apps for every release of PG. I think this problem very much deserves a good solution. Nico --
On Sat, Aug 11, 2018 at 03:32:23PM -0500, Nico Williams wrote: > On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > > -- (3) "SET search_path" with today's code. > > -- > > -- Security and reliability considerations are the same as (2). Today, this > > -- reduces performance by suppressing optimizations like inlining. > > Out of curiosity, why does this suppress inlining? The function call machinery, fmgr_security_definer(), is what actually applies the setting. To inline such functions, one would need to develop a representation that attaches the setting change to the nodes resulting from the inlining. When evaluating said nodes, apply the attached setting. > Anyways, my preference would be to have syntax by which to say: resolve > at declaration time using the then-in-effect search_path and store > as-qualified. Agreed; having that would be great. (I mentioned it as option (7) of https://postgr.es/m/20180710014308.GA805781@rfd.leadboat.com.) It has limitations, though: - Does not help with inexact argument type matches. - While the applicability to sql-language functions seems clear, other languages don't benefit as much. You might extend it to a subset of PL/pgSQL functions, excluding e.g. ones that contain EXECUTE. I see no chance to help PL/Perl or PL/Python. - Unlikely to be a good back-patch candidate. > Another possibility would be to have a way to set a search_path for all > expressions in a given schema, something like: > > SET SCHEMA my_schema DEFAULT search_path = ...; > > which would apply to all expressions in schema elements in schema > "my_schema": > > - CHECK expressions > - INDEX expressions > - VIEWs and MATERIALIZED VIEWs > - FUNCTION and STORED PROCEDURE bodies > - ... > > CREATE SCHEMA IF NOT EXISTS my_schema; > > SET SCHEMA my_schema DEFAULT search_path = my_schema, my_other_schema; > > CREATE OR REPLACE FUNCTION foo() ... AS $$ ... $$; That's interesting. I suspect we'd first want to make inlining possible.
On Wed, Aug 8, 2018 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not sold on #2 either. That path leads to, for example, > s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic > to both readability and portability of your SQL code. We *must* find > a way to do better, not tell people that's what to do. > > When the security team was discussing this issue before, we speculated > about ideas like inventing a function trust mechanism, so that attacks > based on search path manipulations would fail even if they managed to > capture an operator reference. I'd rather go down that path than > encourage people to do more schema qualification. The more I think about it, the more I think having a way to set a lexically-scoped search path is probably the answer. Many years ago, Perl had only "local" as a way of declaring local variables, which used dynamic scoping, and then they invented "my", which uses lexical scoping, and everybody abandoned "local" and started using "my," because otherwise a reference to a variable inside a procedure may happen to refer to a local variable further up the call stack rather than a global variable if the names happen to collide. It turns out that not knowing to which object a reference will refer is awful, and you should avoid it like the plague. This is exactly the same problem we have with search_path. People define functions -- or index expressions -- assuming that the function and operator references will refer to the same thing at execution time that they do at definition time. That is, I think, an entirely *reasonable* expectation. Basically all programming languages work that way. If you could call a C function in such a way that the meaning of identifiers that appeared there was dependent on some piece of state inherited from the caller's context rather than from the declarations visible at the time that the C function was compiled, it would be utter and complete chaos. It would in fact greatly resemble the present state of affairs in PostgreSQL, where making your code reliably mean what you intended requires either forcing the search path everywhere or schema-qualifying the living daylights out of everything. I think we should try to follow Perl's example, acknowledge that we basically got this wrong on the first try, and invent something new that works better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: > The more I think about it, the more I think having a way to set a > lexically-scoped search path is probably the answer. Many years ago, > Perl had only "local" as a way of declaring local variables, which > used dynamic scoping, and then they invented "my", which uses lexical > scoping, and everybody abandoned "local" and started using "my," > because otherwise a reference to a variable inside a procedure may > happen to refer to a local variable further up the call stack rather > than a global variable if the names happen to collide. It turns out > that not knowing to which object a reference will refer is awful, and > you should avoid it like the plague. This is exactly the same problem > we have with search_path. People define functions -- or index > expressions -- assuming that the function and operator references will > refer to the same thing at execution time that they do at definition > time. > > That is, I think, an entirely *reasonable* expectation. Basically all > programming languages work that way. If you could call a C function > in such a way that the meaning of identifiers that appeared there was > dependent on some piece of state inherited from the caller's context > rather than from the declarations visible at the time that the C > function was compiled, it would be utter and complete chaos. It would > in fact greatly resemble the present state of affairs in PostgreSQL, > where making your code reliably mean what you intended requires either > forcing the search path everywhere or schema-qualifying the living > daylights out of everything. I think we should try to follow Perl's > example, acknowledge that we basically got this wrong on the first > try, and invent something new that works better. Well, in C, if the calling function is not in the current C file, you really don't know what function you are linking to --- it depends on what other object files are linked into the executable. I am unclear how lexical scoping helps with this, or with Postgres. With Postgres you are effectively adding functions into the executable that didn't exist at link time, and potentially replacing compiled-in functions with others that might match the call site better. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Aug 14, 2018 at 3:31 PM, Bruce Momjian <bruce@momjian.us> wrote: > Well, in C, if the calling function is not in the current C file, you > really don't know what function you are linking to --- it depends on > what other object files are linked into the executable. True. > I am unclear how lexical scoping helps with this, or with Postgres. Well, if you say sqrt(x) in C, and you don't have sqrt or x defined in the current function, you know that you're going to call a global function called sqrt and pass to it a global variable called x. You're not going to latch onto a local variable called x in the function that called your function, and if the calling function has a local variable called sqrt, that doesn't matter either. The linker can point every called to sqrt() in the program at a different place than expected, but you can't monkey with the behavior of an individual call by having the calling function declare identifiers that mask the ones the function intended to reference. On the other hand, when you call an SQL-language function, or even to some extent a plpgsql-language function, from PostgreSQL, you can in fact change the meanings of every identifier that appears in that function body unless those references are all schema-qualified or the function sets the search path. If an SQL-language function calls sqrt(x), you can set the search_path to some schema you've created where there is a completely different sqrt() function than that function intended to call. That is a very good way to make stuff (a) break or (b) be insecure. The point of having a lexically-scoped search path is that it is generally the case that when you write SQL code, you know the search path under which you want it to execute. You can get that behavior today by attaching a SET clause to the function, but that defeats inlining, is slower, and the search path you configure applies not only to the code to which is attached but to any code which it in turn calls. In other words, the search path setting which you configure has dynamic scope. I submit that's bad. Functions can reasonably be expected to know the search path that should be used to run the code they actually contain, but they cannot and should not need to know the search path that should be used to run code in other functions which they happen to call. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 04:23:33PM -0400, Robert Haas wrote: > On Tue, Aug 14, 2018 at 3:31 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Well, in C, if the calling function is not in the current C file, you > > really don't know what function you are linking to --- it depends on > > what other object files are linked into the executable. > > True. > > > I am unclear how lexical scoping helps with this, or with Postgres. > > Well, if you say sqrt(x) in C, and you don't have sqrt or x defined in > the current function, you know that you're going to call a global > function called sqrt and pass to it a global variable called x. > You're not going to latch onto a local variable called x in the > function that called your function, and if the calling function has a > local variable called sqrt, that doesn't matter either. The linker > can point every called to sqrt() in the program at a different place > than expected, but you can't monkey with the behavior of an individual > call by having the calling function declare identifiers that mask the > ones the function intended to reference. What we are doing is more like C++ virtual functions, where the class calling it can replace function calls in subfunctions: https://www.geeksforgeeks.org/virtual-function-cpp/ > On the other hand, when you call an SQL-language function, or even to > some extent a plpgsql-language function, from PostgreSQL, you can in > fact change the meanings of every identifier that appears in that > function body unless those references are all schema-qualified or the > function sets the search path. If an SQL-language function calls I think we decide that search path alone for functions is insufficient because of data type matching, unless the schema is secure. > sqrt(x), you can set the search_path to some schema you've created > where there is a completely different sqrt() function than that > function intended to call. That is a very good way to make stuff (a) > break or (b) be insecure. > > The point of having a lexically-scoped search path is that it is > generally the case that when you write SQL code, you know the search > path under which you want it to execute. You can get that behavior > today by attaching a SET clause to the function, but that defeats > inlining, is slower, and the search path you configure applies not > only to the code to which is attached but to any code which it in turn > calls. In other words, the search path setting which you configure > has dynamic scope. I submit that's bad. Functions can reasonably be > expected to know the search path that should be used to run the code > they actually contain, but they cannot and should not need to know the > search path that should be used to run code in other functions which > they happen to call. So you are saying PG functions should lock down their search path at function definition time, and use that for all function invocations? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
> On Aug 14, 2018, at 8:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 8, 2018 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not sold on #2 either. That path leads to, for example, >> s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic >> to both readability and portability of your SQL code. We *must* find >> a way to do better, not tell people that's what to do. >> >> When the security team was discussing this issue before, we speculated >> about ideas like inventing a function trust mechanism, so that attacks >> based on search path manipulations would fail even if they managed to >> capture an operator reference. I'd rather go down that path than >> encourage people to do more schema qualification. > > The more I think about it, the more I think having a way to set a > lexically-scoped search path is probably the answer. Many years ago, > Perl had only "local" as a way of declaring local variables, which > used dynamic scoping, and then they invented "my", which uses lexical > scoping, and everybody abandoned "local" and started using "my," > because otherwise a reference to a variable inside a procedure may > happen to refer to a local variable further up the call stack rather > than a global variable if the names happen to collide. It turns out > that not knowing to which object a reference will refer is awful, and > you should avoid it like the plague. This is exactly the same problem > we have with search_path. People define functions -- or index > expressions -- assuming that the function and operator references will > refer to the same thing at execution time that they do at definition > time. > > That is, I think, an entirely *reasonable* expectation. Basically all > programming languages work that way. If you could call a C function > in such a way that the meaning of identifiers that appeared there was > dependent on some piece of state inherited from the caller's context > rather than from the declarations visible at the time that the C > function was compiled, it would be utter and complete chaos. It would > in fact greatly resemble the present state of affairs in PostgreSQL, > where making your code reliably mean what you intended requires either > forcing the search path everywhere or schema-qualifying the living > daylights out of everything. I think we should try to follow Perl's > example, acknowledge that we basically got this wrong on the first > try, and invent something new that works better. I think you are interpreting the problem too broadly. This is basically just a privilege escalation attack vector. If there is a function that gets run under different privileges than those of the caller, and if the caller can coerce the function to do something with those elevated privileges that the function author did not intend, then the caller can do mischief. But if the caller is executing a function that operates under the caller's own permissions, then there is no mischief possible, since the caller can't trick the function to do anything that the caller couldn't do herself directly. For example, if there is a function that takes type anyarray and then performs operations over the elements of that array using operator=, there is no problem with me defining an operator= prior to calling that function and having my operator be the one that gets used, assuming the function doesn't escalate privileges. To my mind, the search_path should get set whenever SetUserIdAndSecContext or similar gets called and restored when SetUserIdAndSecContext gets called to restore the saved uid. Currently coded stuff like: GetUserIdAndSecContext(&saved_uid, &save_sec_context); ... if (saved_uid != authid_uid) SetUserIdAndSecContext(authid_uid, save_sec_context | SECURITY_LOCAL_USERID_CHANGE); ... SetUserIdAndSecContext(saved_uid, save_sec_context); need to have a third variable, saved_search_path or similar. Thoughts? mark
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: > The more I think about it, the more I think having a way to set a > lexically-scoped search path is probably the answer. [...] Yes please! This is what I want. Evaluate the search_path at function definition time, and record code with fully-qualified symbols in the catalog. Nico --
On 08/14/2018 07:01 PM, Nico Williams wrote: > On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: >> The more I think about it, the more I think having a way to set a >> lexically-scoped search path is probably the answer. [...] > > Yes please! > > This is what I want. Evaluate the search_path at function definition > time, and record code with fully-qualified symbols in the catalog. What would the interface to that look like for out-of-tree PL maintainers? -Chap
On Aug 14, 2018, at 4:01 PM, Nico Williams <nico@cryptonector.com> wrote: > > On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: >> The more I think about it, the more I think having a way to set a >> lexically-scoped search path is probably the answer. [...] > > Yes please! > > This is what I want. Evaluate the search_path at function definition > time, and record code with fully-qualified symbols in the catalog. Unfortunately, that falls apart for relocatable extensions.
On Tue, Aug 14, 2018 at 04:42:43PM -0400, Bruce Momjian wrote: > On Tue, Aug 14, 2018 at 04:23:33PM -0400, Robert Haas wrote: > > On Tue, Aug 14, 2018 at 3:31 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > I am unclear how lexical scoping helps with this, or with Postgres. > > > > Well, if you say sqrt(x) in C, and you don't have sqrt or x defined in > > the current function, you know that you're going to call a global > > function called sqrt and pass to it a global variable called x. > > You're not going to latch onto a local variable called x in the > > function that called your function, and if the calling function has a > > local variable called sqrt, that doesn't matter either. The linker > > can point every called to sqrt() in the program at a different place > > than expected, but you can't monkey with the behavior of an individual > > call by having the calling function declare identifiers that mask the > > ones the function intended to reference. > > What we are doing is more like C++ virtual functions, where the class > calling it can replace function calls in subfunctions: > > https://www.geeksforgeeks.org/virtual-function-cpp/ > > > On the other hand, when you call an SQL-language function, or even to > > some extent a plpgsql-language function, from PostgreSQL, you can in > > fact change the meanings of every identifier that appears in that > > function body unless those references are all schema-qualified or the > > function sets the search path. If an SQL-language function calls > > I think we decide that search path alone for functions is insufficient > because of data type matching, unless the schema is secure. Right. For what it's worth, the example I permuted upthread might look like this in a lexical search path world: -- Always secure, even if schema usage does not conform to ddl-schemas-patterns -- and function trust is disabled or unavailable. -- -- At CREATE EXTENSION time only, subject to denial of service from anyone able -- to CREATE in cube schema or earthdistance schema. -- -- Objects in @cube_schema@ are qualified so objects existing in @extschema@ at -- CREATE EXTENSION time cannot mask them. CREATE FUNCTION latitude(earth) RETURNS float8 LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE AS $$SELECT CASE WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) / earth() < -1 THEN -90::float8 WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) / earth() > 1 THEN 90::float8 ELSE degrees(asin(@cube_schema@.cube_ll_coord( $1::@cube_schema@.cube, 3) / earth())) END$$;
On Tue, Aug 14, 2018 at 4:42 PM, Bruce Momjian <bruce@momjian.us> wrote: > So you are saying PG functions should lock down their search path at > function definition time, and use that for all function invocations? Yes, mostly. I don't think we can just change the existing behavior; it would break a catastrophic amount of stuff. But we could add an optional feature that does this, and encourage people to use it, much the way Perl continues to support "local" even though "my" has been a best practice for several decades. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 11:50:24PM +0000, Nasby, Jim wrote: > On Aug 14, 2018, at 4:01 PM, Nico Williams <nico@cryptonector.com> wrote: > > > > On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: > >> The more I think about it, the more I think having a way to set a > >> lexically-scoped search path is probably the answer. [...] > > > > Yes please! > > > > This is what I want. Evaluate the search_path at function definition > > time, and record code with fully-qualified symbols in the catalog. > > Unfortunately, that falls apart for relocatable extensions. It would only require recomputing the bindings at relocation time. IMO PG should not allow extension relocation after installation either, but sadly it does. Nico --
On Tue, Aug 14, 2018 at 5:14 PM, Mark Dilger <hornschnorter@gmail.com> wrote: > I think you are interpreting the problem too broadly. This is basically > just a privilege escalation attack vector. Hmm. Well, I think you're interpreting the problem too narrowly. :-) In my view, the problem isn't just that there is a risk of privilege escalation; in fact, for many users, that's not a serious problem at all because the database has only one user anyway. I think that the core problem is that you can easily write code that looks like it will work just fine and then have it not work at all. For example, somebody: - creates an SQL or PL/plgsql function - uses it to define a functional index - runs REINDEX with a different search path - gets an error because there's some unqualified or ambiguous reference inside the SQL function There might be a security risk here if the REINDEX is run by a different user with elevated privileges, because then maybe one of the references inside the SQL or PL/plgsql function could be captured by a new and malicious object injected into the alternative search path, but that's not really the point. The point is that you might reasonably want this to WORK AS INTENDED, not just fail to be insecure. If your house gets hit by lightning and, as a result, your alarm system goes offline, you do have a potential security problem, but also, every appliance in your house is probably fried, and your house may be on fire. Focusing on the security system specifically is likely the wrong approach. So here. What we want, I think, is a way to create the SQL or PL/pgsql function in such a way that it knows what search path was intended and uses that search path for the SQL that occurs within it. Then, the security problem goes away, but even better, the code works. This is also my basic complaint about the TRUST mechanism: it's not a bad idea, but it seems to me to be tackling the problem from the wrong end. If somebody tries to make it so that when I run pg_dump they hack superuser, having pg_dump error out is surely better than letting them hack superuser. However, having pg_dump error out, while surely better than a privilege compromise, is also not great, because I now don't have a backup. In a sense, I haven't eliminated the security vulnerability at all; I've just reduced the consequences from hack-superuser to deny-successful-backup. But deny-successful-backup is still a pretty bad consequence. What I really want is for pg_dump to succeed in giving me a correct backup, without compromising security. Similar with an attack on any other application. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote: > Right. For what it's worth, the example I permuted upthread might look like > this in a lexical search path world: > > -- Always secure, even if schema usage does not conform to ddl-schemas-patterns > -- and function trust is disabled or unavailable. > -- > -- At CREATE EXTENSION time only, subject to denial of service from anyone able > -- to CREATE in cube schema or earthdistance schema. > -- > -- Objects in @cube_schema@ are qualified so objects existing in @extschema@ at > -- CREATE EXTENSION time cannot mask them. > CREATE FUNCTION latitude(earth) > RETURNS float8 > LANGUAGE SQL > IMMUTABLE STRICT > PARALLEL SAFE > AS $$SELECT CASE > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > / > earth() < -1 THEN -90::float8 > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > / > earth() > 1 THEN 90::float8 > ELSE degrees(asin(@cube_schema@.cube_ll_coord( > $1::@cube_schema@.cube, 3) / earth())) > END$$; Could we get rid of the remaining qualification by setting the lexical search path to @cube_schema@, @extschema@, public? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 15, 2018 at 11:05:06AM -0400, Robert Haas wrote: > On Tue, Aug 14, 2018 at 4:42 PM, Bruce Momjian <bruce@momjian.us> wrote: > > So you are saying PG functions should lock down their search path at > > function definition time, and use that for all function invocations? > > Yes, mostly. I don't think we can just change the existing behavior; > it would break a catastrophic amount of stuff. But we could add an > optional feature that does this, and encourage people to use it, much > the way Perl continues to support "local" even though "my" has been a > best practice for several decades. So each function defines its search_path, and each function you call sets its own search_path, basically? That is what you mean by lexical scope? I think if this approach was fully secure, it would get more traction. We can't lock down the called objects at function definition time since the languages aren't necessarily parsed at that time, and some objects might not exist yet, right? It might be interesting to record a list of objects and their owners at function definition time and check that somehow at function execution time. In relation to your "break a catastrophic amount of stuff," I assume if we have a trust_users GUC, users would be more willing to have expansive search_path values because security concerns would be lessened. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
> On Aug 15, 2018, at 9:02 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Aug 14, 2018 at 5:14 PM, Mark Dilger <hornschnorter@gmail.com> wrote: >> I think you are interpreting the problem too broadly. This is basically >> just a privilege escalation attack vector. > > Hmm. Well, I think you're interpreting the problem too narrowly. :-) > > In my view, the problem isn't just that there is a risk of privilege > escalation; in fact, for many users, that's not a serious problem at > all because the database has only one user anyway. I think that the > core problem is that you can easily write code that looks like it will > work just fine and then have it not work at all. For example, > somebody: > > - creates an SQL or PL/plgsql function > - uses it to define a functional index > - runs REINDEX with a different search path > - gets an error because there's some unqualified or ambiguous > reference inside the SQL function > > There might be a security risk here if the REINDEX is run by a > different user with elevated privileges, because then maybe one of the > references inside the SQL or PL/plgsql function could be captured by a > new and malicious object injected into the alternative search path, > but that's not really the point. The point is that you might > reasonably want this to WORK AS INTENDED, not just fail to be > insecure. If your house gets hit by lightning and, as a result, your > alarm system goes offline, you do have a potential security problem, > but also, every appliance in your house is probably fried, and your > house may be on fire. Focusing on the security system specifically is > likely the wrong approach. I'm like some aspects of what you are proposing. The ability to define local variables in perl is not a bug. I frequently use undef(local $/) and I wouldn't want that taken away by some perl developer who decides that the next version of perl won't have local variables. (Maybe this went away with perl 6 -- I haven't checked.) You want to fix the problem by having an option to lexically scope variables. That's fine as far as it goes. But if the option to not use lexically scoped variables, and to use dynamically scoped variables remains, with the function author choosing between the two modes, then you're basically writing off the users of dynamically scoped variables by saying, in effect, "we're never going to fix the security problem you have, and if you want to be secure, stop using that deprecated dynamically scoped variable feature, move into the 21st century, and use lexically scoped variables like Robert does". Go ahead and define some new lexical scope mechanism. I'm probably going to move into the 21st century with you and use it. (I mostly use "my", not "local", when I write perl code.) But let's treat that as a new feature independent of fixing the security problems with dynamically scoped variables and privilege escalation attacks. mark
Mark Dilger <hornschnorter@gmail.com> writes: > Go ahead and define some new lexical scope mechanism. I'm probably > going to move into the 21st century with you and use it. (I mostly > use "my", not "local", when I write perl code.) But let's treat that as > a new feature independent of fixing the security problems with dynamically > scoped variables and privilege escalation attacks. To my mind, the really fundamental problem here is that our default/ traditional search_path includes publicly writable schemas. I don't especially care whether you establish your search_path setting dynamically, or lexically, or through magic hocus pocus: if it includes a schema that a hostile user can create objects in, you'll get pwned sooner not later. So it doesn't seem to me that Robert's proposal is really doing much to move the goalposts. He's presuming that the main problem is that user A and user B have different ideas about which schemas are trusted ... but I think that that is, at most, a minority use-case. I suspect that a system-wide definition of which schemas are trusted would serve many installations just fine. The point of the function trust proposal was to provide some degree of security even in an environment where there are untrusted schemas in your search path. However, the other approach we could take is to deprecate doing that: tell people to stop putting executable objects in "public", or maybe even deprecate "public" altogether, so that PG search paths are treated more like Unix PATH settings. You don't put /tmp in your PATH. The problem with that, of course, is the extent to which it'd break existing applications --- but having to run around and decorate your functions with lexical search path settings isn't exactly a free lunch either. regards, tom lane
On Wed, Aug 15, 2018 at 12:04:53PM -0400, Robert Haas wrote: > On Tue, Aug 14, 2018 at 10:44 PM, Noah Misch <noah@leadboat.com> wrote: > > Right. For what it's worth, the example I permuted upthread might look like > > this in a lexical search path world: > > > > -- Always secure, even if schema usage does not conform to ddl-schemas-patterns > > -- and function trust is disabled or unavailable. > > -- > > -- At CREATE EXTENSION time only, subject to denial of service from anyone able > > -- to CREATE in cube schema or earthdistance schema. > > -- > > -- Objects in @cube_schema@ are qualified so objects existing in @extschema@ at > > -- CREATE EXTENSION time cannot mask them. > > CREATE FUNCTION latitude(earth) > > RETURNS float8 > > LANGUAGE SQL > > IMMUTABLE STRICT > > PARALLEL SAFE > > AS $$SELECT CASE > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > / > > earth() < -1 THEN -90::float8 > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > / > > earth() > 1 THEN 90::float8 > > ELSE degrees(asin(@cube_schema@.cube_ll_coord( > > $1::@cube_schema@.cube, 3) / earth())) > > END$$; > > Could we get rid of the remaining qualification by setting the lexical > search path to @cube_schema@, @extschema@, public? That would make the @cube_schema@ qualifications superfluous, but you would do s/earth()/@extschema@.earth()/ to avoid a @cube_schema@.earth() winning. In general, when a query needs to find objects in N maybe-untrusted schemas, you'll schema-qualify objects of at least N-1 of the schemas.
On Wed, Aug 15, 2018 at 10:40:55AM -0500, Nico Williams wrote: > On Tue, Aug 14, 2018 at 11:50:24PM +0000, Nasby, Jim wrote: > > On Aug 14, 2018, at 4:01 PM, Nico Williams <nico@cryptonector.com> wrote: > > > > > > On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote: > > >> The more I think about it, the more I think having a way to set a > > >> lexically-scoped search path is probably the answer. [...] > > > > > > Yes please! > > > > > > This is what I want. Evaluate the search_path at function definition > > > time, and record code with fully-qualified symbols in the catalog. > > > > Unfortunately, that falls apart for relocatable extensions. > > It would only require recomputing the bindings at relocation time. IMO > PG should not allow extension relocation after installation either, but > sadly it does. Agreed. I wouldn't mind deprecating relocatable=true, if it helps.
> On Aug 15, 2018, at 9:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Mark Dilger <hornschnorter@gmail.com> writes: >> Go ahead and define some new lexical scope mechanism. I'm probably >> going to move into the 21st century with you and use it. (I mostly >> use "my", not "local", when I write perl code.) But let's treat that as >> a new feature independent of fixing the security problems with dynamically >> scoped variables and privilege escalation attacks. > > To my mind, the really fundamental problem here is that our default/ > traditional search_path includes publicly writable schemas. I don't > especially care whether you establish your search_path setting > dynamically, or lexically, or through magic hocus pocus: if it includes a > schema that a hostile user can create objects in, you'll get pwned sooner > not later. So it doesn't seem to me that Robert's proposal is really > doing much to move the goalposts. He's presuming that the main problem > is that user A and user B have different ideas about which schemas are > trusted ... but I think that that is, at most, a minority use-case. > I suspect that a system-wide definition of which schemas are trusted > would serve many installations just fine. > > The point of the function trust proposal was to provide some degree of > security even in an environment where there are untrusted schemas in your > search path. However, the other approach we could take is to deprecate > doing that: tell people to stop putting executable objects in "public", > or maybe even deprecate "public" altogether, so that PG search paths are > treated more like Unix PATH settings. You don't put /tmp in your PATH. This is the part I don't understand. Of course, I don't put /tmp or ./ in my path, but a well configured system doesn't get hacked simply because a non-root user puts those in her path: + CREATE TABLE security_test_table (str TEXT); + CREATE FUNCTION add_prefix(prefix TEXT, str TEXT) RETURNS TEXT AS $$ + SELECT $1 || ': ' || $2; + $$ LANGUAGE sql IMMUTABLE STRICT; + CREATE FUNCTION security_test_table_trigger_proc () RETURNS TRIGGER AS $$ + BEGIN + NEW.str := add_prefix('admin', NEW.str); + RETURN NEW; + END; + $$ LANGUAGE plpgsql; + CREATE TRIGGER security_test_table_trigger + BEFORE INSERT OR UPDATE + ON security_test_table + FOR EACH ROW + EXECUTE PROCEDURE security_test_table_trigger_proc(); + CREATE ROLE alice + NOSUPERUSER + NOCREATEDB + NOCREATEROLE + NOINHERIT + NOREPLICATION + NOBYPASSRLS + CONNECTION LIMIT 1; + CREATE SCHEMA alice; + GRANT ALL ON SCHEMA alice TO alice; + GRANT INSERT, SELECT ON security_test_table TO alice; + SET ROLE alice; + CREATE FUNCTION alice.add_prefix(prefix TEXT, str TEXT) RETURNS TEXT AS $$ + SELECT 'alice: ' || $2; + $$ LANGUAGE sql IMMUTABLE STRICT; + SET search_path TO alice, public; + INSERT INTO security_test_table (str) VALUES ('monkey'); + RESET ROLE; + SELECT * FROM security_test_table; + str + --------------- + alice: monkey + (1 row) + Unless you are going to prevent Alice from having her own schema in which she can create her own functions, this needs fixing. I don't see point about not putting 'alice' in the path, since Alice can do so. What am I missing here? mark
On Wed, Aug 15, 2018 at 3:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Aug 15, 2018 at 11:05:06AM -0400, Robert Haas wrote: >> On Tue, Aug 14, 2018 at 4:42 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > So you are saying PG functions should lock down their search path at >> > function definition time, and use that for all function invocations? >> >> Yes, mostly. I don't think we can just change the existing behavior; >> it would break a catastrophic amount of stuff. But we could add an >> optional feature that does this, and encourage people to use it, much >> the way Perl continues to support "local" even though "my" has been a >> best practice for several decades. > > So each function defines its search_path, and each function you call sets > its own search_path, basically? That is what you mean by lexical scope? > I think if this approach was fully secure, it would get more traction. By lexical scope, I mean a search_path that applies only to the SQL queries that textually appear within that function definition, not code that they call indirectly. The SET clause attached to a function uses dynamic scope. See https://en.wikipedia.org/wiki/Scope_(computer_science)#Lexical_scope_vs._dynamic_scope -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 16, 2018 at 12:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mark Dilger <hornschnorter@gmail.com> writes: >> Go ahead and define some new lexical scope mechanism. I'm probably >> going to move into the 21st century with you and use it. (I mostly >> use "my", not "local", when I write perl code.) But let's treat that as >> a new feature independent of fixing the security problems with dynamically >> scoped variables and privilege escalation attacks. > > To my mind, the really fundamental problem here is that our default/ > traditional search_path includes publicly writable schemas. I don't > especially care whether you establish your search_path setting > dynamically, or lexically, or through magic hocus pocus: if it includes a > schema that a hostile user can create objects in, you'll get pwned sooner > not later. So it doesn't seem to me that Robert's proposal is really > doing much to move the goalposts. He's presuming that the main problem > is that user A and user B have different ideas about which schemas are > trusted ... but I think that that is, at most, a minority use-case. > I suspect that a system-wide definition of which schemas are trusted > would serve many installations just fine. I agree that if this is the sort of thing you are worried about, then function trust is a reasonable step in a good direction. It is definitely the case that letting hostile users put stuff into your search path provides a large number of ways for them to potentially pwn you. I do wonder whether, instead of de-trusting those users, you ought not to instead deny them the ability to create objects in your search_path, either by changing their privileges or by changing your own search path. But if you can't do that for some reason, de-trusting executable objects that they create in that search_path is a good thing to be able to do. But that's not really where I think the biggest problems are, even though I agree that those are problems. In my view, the biggest problem is that sometimes you may need to write code that works with any search_path setting. pg_dump is a good example. It's got to iterate through all of the schemas in the system and dump them all. Extensions are another example. If you install the earthdistance extension, it's supposed to work no matter how search_path is set when the functions it contains get called. Functions used in defining expression indexes are another example; you want them to work the same way no matter what the prevailing search_path happens to be when tuples are inserted or the table gets reindexed. As long as it's straightforward to swap out the search path under which the function was intended to execute for some other one, stuff is going to break. Again, this is not an argument against function trust, which I actually think is a pretty good proposal. It just doesn't address the thing I personally consider to be the biggest problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 16, 2018 at 03:46:06PM -0400, Robert Haas wrote: > On Wed, Aug 15, 2018 at 3:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Aug 15, 2018 at 11:05:06AM -0400, Robert Haas wrote: > >> On Tue, Aug 14, 2018 at 4:42 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> > So you are saying PG functions should lock down their search path at > >> > function definition time, and use that for all function invocations? > >> > >> Yes, mostly. I don't think we can just change the existing behavior; > >> it would break a catastrophic amount of stuff. But we could add an > >> optional feature that does this, and encourage people to use it, much > >> the way Perl continues to support "local" even though "my" has been a > >> best practice for several decades. > > > > So each function defines its search_path, and each function you call sets > > its own search_path, basically? That is what you mean by lexical scope? > > I think if this approach was fully secure, it would get more traction. > > By lexical scope, I mean a search_path that applies only to the SQL > queries that textually appear within that function definition, not > code that they call indirectly. The SET clause attached to a function > uses dynamic scope. See > https://en.wikipedia.org/wiki/Scope_(computer_science)#Lexical_scope_vs._dynamic_scope I understand you don't like that a search_path changed by a function is passed down to functions it calls, but what would you like the system to use as a search path for called functions? The search path of the session? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 08/16/18 21:31, Bruce Momjian wrote: > I understand you don't like that a search_path changed by a function is > passed down to functions it calls, but what would you like the system to > use as a search path for called functions? The search path of the > session? This is beginning to sound like an exercise in /emulating/ lexical scoping in a system that can run bits of code supplied in many different PLs, not all of which can necessarily participate in "real" lexical scoping. It could look like another flavor of setting a GUC. There's SET SESSION, SET LOCAL, and maybe now SET LEXICAL. The key to the emulation is to temporarily unstack the SET LEXICAL when the function it "lexically" applies to makes calls to other functions. And restack it when those other functions return. Then there's room for bikeshedding on just where those unstacks/restacks happen. I would think at clearly defined places in the core code, so as to be independent of any PL. Any execution of command/query through SPI ? Any function call through fmgr? Emulating lexical scoping through dynamic means will have an overhead. The usual use case would not involve a SET LEXICAL command issued on its own, but such a command bound into a function declaration, as the current SET commands can be. /Some/ PLs may be amenable to a static analysis able to determine what objects are referred to in the code and bind those ahead of time. Maybe, like the lanvalidator entry in pg_language, there could be a lanprebind function, for those PLs where that's tractable, and it would be called when the function is created with any "lexically" bound variables. Record with each function whether lanprebind returned success; if it did, skip the dynamic variable futzing around calls to that function. If the PL has no lanprebind function, or if it returned no-can-do for a particular function and set of lexical settings, then the dynamic emulation steps in. Sounds a bit elaborate, but as it happens, there are further reasons it could be good to have such "lexical" settings. There are some called for in the SQL standard, for example. [1] -Chap [1] https://www.postgresql.org/message-id/5AAEAE0F.20006%40anastigmatix.net
On Thu, Aug 16, 2018 at 10:58:21PM -0400, Chapman Flack wrote: > On 08/16/18 21:31, Bruce Momjian wrote: > > > I understand you don't like that a search_path changed by a function is > > passed down to functions it calls, but what would you like the system to > > use as a search path for called functions? The search path of the > > session? > > This is beginning to sound like an exercise in /emulating/ lexical scoping > in a system that can run bits of code supplied in many different PLs, not > all of which can necessarily participate in "real" lexical scoping. > > It could look like another flavor of setting a GUC. There's SET SESSION, > SET LOCAL, and maybe now SET LEXICAL. The key to the emulation is to > temporarily unstack the SET LEXICAL when the function it "lexically" > applies to makes calls to other functions. And restack it when those > other functions return. Right. My point is that people don't like dynamic scoping in Perl because it overrides the lexical scoping implicit in the source code structure. For Postgres objects, like functions, there is no implicit scope --- each object is defined and managed on its own. So, the problem isn't that dynamic scoping is overriding implicit scoping, but that we only have dynamic scoping, and there is no implicit scoping to override. So, to move this forward, we have to have some way of defining implicit scoping. It could be: o other objects in the same schema o owned by the same user o a member of the same extension o a SET search_path in the function o the search_path at object creation time o objects referenced at object creation time (I guess Oracle's package syntax helps them here.) You have to define some implicit scoping so you don't have to rely on dynamic scoping. Two of Robert's problem cases were expression indexes that are later reindexed, and calling an extension that wants to call functions from the same extension, even if the search_path does not include the extension's schema. So, which implicit scoping rules would fix those problems? In addition, you have people adding, updating, and removing objects while these functions are required to still run properly. That is something that also doesn't typically happen in a Perl program, because the source files or package APIs are assumed to be tightly managed. Also, with Perl, you have a layer of objects, from built-in ones, to packages, to application-level objects. With Postgres, they are all thrown together into the same tables, and there is no filtering of scopes. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote: > > When the security team was discussing this issue before, we speculated > > about ideas like inventing a function trust mechanism, so that attacks > > based on search path manipulations would fail even if they managed to > > capture an operator reference. I'd rather go down that path than > > encourage people to do more schema qualification. > > Interesting. If we got a function trust mechanism, how much qualification > would you then like? Here are the levels I know about, along with their > implications: Any preferences among these options and the fifth option I gave in https://postgr.es/m/20180815024429.GA3535710@rfd.leadboat.com? I don't want to leave earthdistance broken. So far, though, it seems no two people accept any one fix. > -- (1) Use qualified references and exact match for all objects. > -- > -- Always secure, even if schema usage does not conform to ddl-schemas-patterns > -- and function trust is disabled. > -- > -- Subject to denial of service from anyone able to CREATE in cube schema or > -- earthdistance schema. > CREATE FUNCTION latitude(earth) > RETURNS float8 > LANGUAGE SQL > IMMUTABLE STRICT > PARALLEL SAFE > AS $$SELECT CASE > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > OPERATOR(pg_catalog./) > @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8 > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > OPERATOR(pg_catalog./) > @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8 > ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord( > $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth())) > END$$; > > > -- (2) Use qualified references for objects outside pg_catalog. > -- > -- With function trust disabled, this would be subject to privilege escalation > -- from anyone able to CREATE in cube schema. > -- > -- Subject to denial of service from anyone able to CREATE in cube schema or > -- earthdistance schema. > CREATE FUNCTION latitude(earth) > RETURNS float8 > LANGUAGE SQL > IMMUTABLE STRICT > PARALLEL SAFE > AS $$SELECT CASE > WHEN @cube_schema@.cube_ll_coord($1, 3) > / > @extschema@.earth() < -1 THEN -90::float8 > WHEN @cube_schema@.cube_ll_coord($1, 3) > / > @extschema@.earth() > 1 THEN 90::float8 > ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth())) > END$$; > > > -- (3) "SET search_path" with today's code. > -- > -- Security and reliability considerations are the same as (2). Today, this > -- reduces performance by suppressing optimizations like inlining. > CREATE FUNCTION latitude(earth) > RETURNS float8 > LANGUAGE SQL > IMMUTABLE STRICT > PARALLEL SAFE > SET search_path FROM CURRENT > AS $$SELECT CASE > WHEN cube_ll_coord($1, 3) > / > earth() < -1 THEN -90::float8 > WHEN cube_ll_coord($1, 3) > / > earth() > 1 THEN 90::float8 > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > END$$; > > > -- (4) Today's code (reformatted). > -- > -- Always secure if schema usage conforms to ddl-schemas-patterns, even if > -- function trust is disabled. If cube schema or earthdistance schema is not in > -- search_path, function doesn't work. > CREATE FUNCTION latitude(earth) > RETURNS float8 > LANGUAGE SQL > IMMUTABLE STRICT > PARALLEL SAFE > AS $$SELECT CASE > WHEN cube_ll_coord($1, 3) > / > earth() < -1 THEN -90::float8 > WHEN cube_ll_coord($1, 3) > / > earth() > 1 THEN 90::float8 > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > END$$;
On Thu, Aug 30, 2018 at 12:06:09AM -0700, Noah Misch wrote: > On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > > On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote: > > > When the security team was discussing this issue before, we speculated > > > about ideas like inventing a function trust mechanism, so that attacks > > > based on search path manipulations would fail even if they managed to > > > capture an operator reference. I'd rather go down that path than > > > encourage people to do more schema qualification. > > > > Interesting. If we got a function trust mechanism, how much qualification > > would you then like? Here are the levels I know about, along with their > > implications: > > Any preferences among these options and the fifth option I gave in > https://postgr.es/m/20180815024429.GA3535710@rfd.leadboat.com? I don't want > to leave earthdistance broken. So far, though, it seems no two people accept > any one fix. Ping. I prefer (1) for most functions. The loss of readability doesn't weigh on me much, because contrib and pg_proc.dat don't have much SQL code. I would use (3) for ts_debug(), because that function is long, not performance-critical, and uses only pg_catalog objects; there may be a few other functions like that. Implementing function trust would raise the absolute attractiveness of (2), but I think I would continue to prefer (1). If we can't agree on something here, (4) stands, and earthdistance functions will continue to fail unless both the cube extension's schema and the earthdistance extension's schema are in search_path. That's bad, and I don't want to prolong it. I don't think implementing function trust or a lexical search_path makes (4) cease to be bad. Implementing both, however, would make (4) non-bad. > > -- (1) Use qualified references and exact match for all objects. > > -- > > -- Always secure, even if schema usage does not conform to ddl-schemas-patterns > > -- and function trust is disabled. > > -- > > -- Subject to denial of service from anyone able to CREATE in cube schema or > > -- earthdistance schema. > > CREATE FUNCTION latitude(earth) > > RETURNS float8 > > LANGUAGE SQL > > IMMUTABLE STRICT > > PARALLEL SAFE > > AS $$SELECT CASE > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > OPERATOR(pg_catalog./) > > @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8 > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > OPERATOR(pg_catalog./) > > @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8 > > ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord( > > $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth())) > > END$$; > > > > > > -- (2) Use qualified references for objects outside pg_catalog. > > -- > > -- With function trust disabled, this would be subject to privilege escalation > > -- from anyone able to CREATE in cube schema. > > -- > > -- Subject to denial of service from anyone able to CREATE in cube schema or > > -- earthdistance schema. > > CREATE FUNCTION latitude(earth) > > RETURNS float8 > > LANGUAGE SQL > > IMMUTABLE STRICT > > PARALLEL SAFE > > AS $$SELECT CASE > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > / > > @extschema@.earth() < -1 THEN -90::float8 > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > / > > @extschema@.earth() > 1 THEN 90::float8 > > ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth())) > > END$$; > > > > > > -- (3) "SET search_path" with today's code. > > -- > > -- Security and reliability considerations are the same as (2). Today, this > > -- reduces performance by suppressing optimizations like inlining. > > CREATE FUNCTION latitude(earth) > > RETURNS float8 > > LANGUAGE SQL > > IMMUTABLE STRICT > > PARALLEL SAFE > > SET search_path FROM CURRENT > > AS $$SELECT CASE > > WHEN cube_ll_coord($1, 3) > > / > > earth() < -1 THEN -90::float8 > > WHEN cube_ll_coord($1, 3) > > / > > earth() > 1 THEN 90::float8 > > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > > END$$; > > > > > > -- (4) Today's code (reformatted). > > -- > > -- Always secure if schema usage conforms to ddl-schemas-patterns, even if > > -- function trust is disabled. If cube schema or earthdistance schema is not in > > -- search_path, function doesn't work. > > CREATE FUNCTION latitude(earth) > > RETURNS float8 > > LANGUAGE SQL > > IMMUTABLE STRICT > > PARALLEL SAFE > > AS $$SELECT CASE > > WHEN cube_ll_coord($1, 3) > > / > > earth() < -1 THEN -90::float8 > > WHEN cube_ll_coord($1, 3) > > / > > earth() > 1 THEN 90::float8 > > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > > END$$; >
On Wed, Dec 05, 2018 at 11:20:52PM -0800, Noah Misch wrote: > On Thu, Aug 30, 2018 at 12:06:09AM -0700, Noah Misch wrote: > > On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > > > On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote: > > > > When the security team was discussing this issue before, we speculated > > > > about ideas like inventing a function trust mechanism, so that attacks > > > > based on search path manipulations would fail even if they managed to > > > > capture an operator reference. I'd rather go down that path than > > > > encourage people to do more schema qualification. > > > > > > Interesting. If we got a function trust mechanism, how much qualification > > > would you then like? Here are the levels I know about, along with their > > > implications: > > > > Any preferences among these options and the fifth option I gave in > > https://postgr.es/m/20180815024429.GA3535710@rfd.leadboat.com? I don't want > > to leave earthdistance broken. So far, though, it seems no two people accept > > any one fix. > > Ping. I prefer (1) for most functions. Nobody else has stated an explicit preference, but here are the votes I would expect based on other comments within this thread (feel free to correct): Option-1: Misch Option-4: Lane Option-5: Haas I would be okay with option-5, which presupposes a lexical search path for functions. Due to the magnitude of adding that feature, I don't expect to attempt an implementation myself. (Would anyone else like to?) > If we can't agree on something here, (4) stands, and earthdistance functions > will continue to fail unless both the cube extension's schema and the > earthdistance extension's schema are in search_path. That's bad, and I don't > want to prolong it. I don't think implementing function trust or a lexical > search_path makes (4) cease to be bad. Implementing both, however, would make > (4) non-bad. > > > > -- (1) Use qualified references and exact match for all objects. > > > -- > > > -- Always secure, even if schema usage does not conform to ddl-schemas-patterns > > > -- and function trust is disabled. > > > -- > > > -- Subject to denial of service from anyone able to CREATE in cube schema or > > > -- earthdistance schema. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > > OPERATOR(pg_catalog./) > > > @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8 > > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > > OPERATOR(pg_catalog./) > > > @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8 > > > ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord( > > > $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth())) > > > END$$; > > > > > > > > > -- (2) Use qualified references for objects outside pg_catalog. > > > -- > > > -- With function trust disabled, this would be subject to privilege escalation > > > -- from anyone able to CREATE in cube schema. > > > -- > > > -- Subject to denial of service from anyone able to CREATE in cube schema or > > > -- earthdistance schema. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > > / > > > @extschema@.earth() < -1 THEN -90::float8 > > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > > / > > > @extschema@.earth() > 1 THEN 90::float8 > > > ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth())) > > > END$$; > > > > > > > > > -- (3) "SET search_path" with today's code. > > > -- > > > -- Security and reliability considerations are the same as (2). Today, this > > > -- reduces performance by suppressing optimizations like inlining. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > SET search_path FROM CURRENT > > > AS $$SELECT CASE > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() < -1 THEN -90::float8 > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() > 1 THEN 90::float8 > > > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > > > END$$; > > > > > > > > > -- (4) Today's code (reformatted). > > > -- > > > -- Always secure if schema usage conforms to ddl-schemas-patterns, even if > > > -- function trust is disabled. If cube schema or earthdistance schema is not in > > > -- search_path, function doesn't work. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() < -1 THEN -90::float8 > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() > 1 THEN 90::float8 > > > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > > > END$$; > >