Thread: pie-in-sky idea: 'sensitive' function parameters
CREATE FUNCTION commence_primary_ignition(target text, password text) RETURNS void AS ...; SELECT commence_primary_ignition(target=>'Alderaan', password=>'1234'); /* 1234 could appear in logs, activity stats ... disturbing */ CREATE OR REPLACE FUNCTION commence_primary_ignition( target text, password text SENSITIVE) RETURNS void AS ...; SELECT commence_primary_ignition(target=>'Alderaan', password=>'1234'); /* logs, stats show something like ... password=>[redacted]); */ I had this idea while thinking about how to send a query with some sensitive data in a way that would protect the data from log exposure, and realizing I couldn't think of any. Docs for log_statement even promise that extended query protocol Bind parameters do get included. (But docs also suggested that some minimal parsing occurs before log_statement happens, so there could be an opportunity to detect that a sensitive parameter is being mentioned and the extents of the assigned expression.) I also appreciate that one doesn't want to empower a user to conceal queries at will (thus track_activities and log_statement are SUSET), which led me to the idea of a function parameter declaration, putting the function definer in control of what bits should get redacted. In full generality, I'm sure this would present lots of implementation challenges. But various restrictions might possibly simplify it.... - recognize only in calls using named notation for the sensitive parameter(s)? (nothing to check for function calls using only positional notation) - introduce a special named-notation variant? target=>'Alderaan', password=!>'1234' (no function lookup needed during parse, can identify what to redact immediately, and defer a simple check that 'password' really is declared sensitive to some later stage when function is looked up) - recognize only in extended-protocol queries where the parameter is supplied by a Bind? (know what to redact without having to parse arbitrarily hairy expressions) Would anyone else see some value in this capability? Could it (or some suitable restriction of it) seem implementable, or would the complications be overwhelming? -Chap
Chapman Flack <chap@anastigmatix.net> writes: > ... which led me to the idea of a function parameter > declaration, putting the function definer in control of what > bits should get redacted. +1 for thinking outside the box, but ... > Would anyone else see some value in this capability? Could it > (or some suitable restriction of it) seem implementable, or would > the complications be overwhelming? ... the problem with this idea is that knowledge that the item ought to be hidden would be obtained only very late in the parsing process. So for example if you fat-fingered something just to the left of the function call in the query text, or the name of the function itself, your password would still get exposed in the log. This indeed is the core problem with every proposal I've seen for semantics-based log filtering. Error logging needs to be considered as a very low-level operation, because reports may come out when little if anything is known about the real semantics of the query. regards, tom lane
On 3 February 2018 at 11:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chapman Flack <chap@anastigmatix.net> writes:
> ... which led me to the idea of a function parameter
> declaration, putting the function definer in control of what
> bits should get redacted.
+1 for thinking outside the box, but ...
> Would anyone else see some value in this capability? Could it
> (or some suitable restriction of it) seem implementable, or would
> the complications be overwhelming?
... the problem with this idea is that knowledge that the item ought to be
hidden would be obtained only very late in the parsing process. So for
example if you fat-fingered something just to the left of the function
call in the query text, or the name of the function itself, your password
would still get exposed in the log.
This indeed is the core problem with every proposal I've seen for
semantics-based log filtering. Error logging needs to be considered
as a very low-level operation, because reports may come out when
little if anything is known about the real semantics of the query.
About the only time I think it's really viable to pursue is if it's restricted to bind parameters. We only log those later and more selectively as it is, so it seems much more reasonable to say "I never want <parameter X> to appear in the logs".
That said, I'm not sure it can be done at the function-interface level, or if it'd have to be done in the Bind message to make it reliable and available early enough.
On 02/02/18 22:46, Tom Lane wrote: > ... the problem with this idea is that knowledge that the item ought to be > hidden would be obtained only very late in the parsing process. So for > example if you fat-fingered something just to the left of the function > call in the query text, or the name of the function itself, your password > would still get exposed in the log. That didn't seem like a deal-breaker to me, as the common case I had imagined for it would be where the query is coded into something (a webapp, say) that just has some bits of sensitive data to pass in, and would surely just be tested on dummy data until it was clear the canned query was at least free of fat-finger issues. Indeed, should the feature have to be restricted for practicality's sake to only working with bound parameters and certain protocol variants, it might not be easy or even possible to use it improvisationally from psql, but that might be an acceptable limitation. On 02/03/18 02:14, Craig Ringer wrote: > About the only time I think it's really viable to pursue is if it's > restricted to bind parameters. We only log those later and more > selectively as it is, so it seems much more reasonable to say "I never > want <parameter X> to appear in the logs". And I think that could be an acceptable restriction. One way could use a simple flag to accompany the parameter binding from the client, which would be recognized early enough to keep the parameter out of the logs, but also checked later when the function info is available, throwing an error if it was used for a parameter not declared sensitive, or not used for a parameter that is. Or it could even have to be used in a prepared statement. A little clunkier and one more round trip to the server, but probably not onerous, and by the time PREPARE completes, surely it can be known which parameter slots are declared that way. Still wouldn't account for internal statements logged or errors thrown by whatever the function does, but maybe the existing machinery for declaring functions leakproof is at least a start on that? -Chap
Just giving this thread a bump in honor of the mention of sensitive things in logs in the cryptography unconference session. I'm not partisan about any of the particular syntax examples I gave earlier, but it seems like there were two key ingredients: 1. In what is sent from the client to the server, certain parameters are marked as sensitive in ways that are obvious early at parse time, before having to look up or analyze anything. 2. But those markings are later compared to the actual declarations of things, once those are resolved. It is an error either to send as 'sensitive' a parameter that isn't so declared, or to forget to send as 'sensitive' one that is. The first error is to prevent evildoers using the facility to hide values from the logs arbitrarily in the queries they are sending. The second error is to catch mistakes during app development. It's possible that a query sent by an app under development won't have the right things marked 'sensitive' yet, and it's possible those queries get exposed in the logs because the early parse-time indication that they shouldn't be is missing. But at that stage of development, the values being sent shouldn't really be sensitive ones, and making such a query an error ensures such omissions are caught and fixed. Regards, -Chap
On Fri, May 29, 2020 at 12:38 PM Chapman Flack <chap@anastigmatix.net> wrote: > Just giving this thread a bump in honor of the mention of sensitive > things in logs in the cryptography unconference session. I agree that there's a problem here and I would like to see a solution to it, too. Several EnterpriseDB customers have complained about the fact that ALTER USER .. PASSWORD logs the password, which is a similar kind of problem, though it involves DDL rather than DML. But, I don't really see a way forward that doesn't involve breaking stuff that works today. We have options to log queries before parsing them. You can't redact sensitive details at that point because you don't know whether the query contains any such details, or where they are located. You can't postpone logging the query without changing the behavior of the system in user-visible ways. What if parsing gets stuck in an infinite loop and you want to look at the log to see which query got stuck? What if you want to look at the timestamps on the log messages to know when the server started dealing with some particular query? On a related note, what if parsing fails? Not only should the logging that happens before parsing already have been done, but the parse error itself should also log the query that triggered it, so that the user has some chance of troubleshooting the situation. However, since we haven't completed parsing, there's no way to redact what gets logged in such a case; we can't know where the sensitive stuff is. Now, you might say that you only care about redacting queries that are well-formed, but parsing can fail for other reasons - e.g. because you ran out of memory. You could argue that such cases are unlikely enough that you don't care about them, but that doesn't mean somebody else won't file a CVE. Also, even if you are logging after parsing has been successfully completed, does that mean that you know which parts of the query contain sensitive information? Parse trees contain some positional information, but I'm not sure that there's enough information there, or that we have it in all the cases somebody might want to redact. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/29/20 14:51, Robert Haas wrote: > stuff that works today. We have options to log queries before parsing > them. You can't redact sensitive details at that point because you > don't know whether the query contains any such details, or where they > are located. You can't postpone logging the query without changing the > behavior of the system in user-visible ways. What if parsing gets > stuck in an infinite loop and you want to look at the log to see which > query got stuck? What if you want to look at the timestamps on the log A possible step in the direction of good-enough would be to have 'sensitive' flags only in the parameter-bind message of the extended protocol. The query you send down has only placeholders. The sensitive stuff comes in the bind message. Recognizing a 'sensitive' bit for a parameter in a bind message ought to be entirely feasible. If parsing the query failed and got logged because of the failure, the log record just shows the placeholders. Again, to protect from the other kind of misuse (a malicious query hiding values from the log), there should be a check after parsing/ analysis that the values sent with 'sensitive' flags in the bind message are exactly the ones that were so declared. A stored function/procedure might have a way to declare some of its parameters that way, with some default role controlling permission to declare such a function. Maybe you could even declare such parameters directly in the syntax of preparing a query (with, again, control over who gets to prepare such a query). There again, the correspondence of those declarations to the bind message can't be verified until after parsing/analysis, but meanwhile the sensitive values are clearly flagged in the bind message, and any logging on a parse failure is still able to avoid logging them. Regards, -Chap
Chapman Flack <chap@anastigmatix.net> writes: > A possible step in the direction of good-enough would be to have > 'sensitive' flags only in the parameter-bind message of the extended > protocol. Yeah, if we restrict our promises to being willing to not log (some) bind parameters, that seems far more manageable, and reliable, than figuring out what parts of query strings need to be hidden. One missing part of that is that we'd need to support bind parameters for utility statements, eg new password in ALTER USER. That's been on the wish list for a long time anyway, of course. I think it's mostly a matter of lack of round tuits, rather than any fundamental problem. (Parameters in transaction control statements might have fundamental problems, but we can just dismiss that as out of scope.) I don't have a lot of faith (not any, really) in your notion of tying control of the feature to function arguments. That falls down for all of the simpler cases I can think of: aside from the ALTER USER PASSWORD case, there's INSERT INTO accounts(..., creditcardnumber, ...) VALUES(..., $n, ...). Neither one of those have a nearby UDF to control it with. Also, what would we do with something like sensitive_function($1 || 'constant') ? regards, tom lane
On Fri, May 29, 2020 at 3:05 PM Chapman Flack <chap@anastigmatix.net> wrote: > A possible step in the direction of good-enough would be to have > 'sensitive' flags only in the parameter-bind message of the extended > protocol. Interesting idea. Changing the wire protocol for this sort of thing makes it a much bigger lift, but it might help, at least in some cases. It does however require that the user being using prepared queries, and that the user know which data is sensitive. For instance, if a user is sitting there in a psql shell and the requirement is that if they happen to type ALTER USER .. PASSWORD the new password doesn't get logged, this approach fails both because the user doesn't do anything to identify the query as special, and also because it's not prepared. So in a certain sense you could say that with this design the server just passes the buck: it's too hard for us to figure out which things to log, so we're making it your job to tell us.... Another point to consider is that I think we already have the ability to suppress logging of bind parameters. So, people who want this sort of thing and are able to use bind parameters can just not log them and then all is well.That does have the deficiency that we then log NO bind parameters, and somebody might want to log them in some situations but not others, but perhaps there is a simpler way of accomplishing that than what you've outlined here? I don't know. I'm just throwing out ideas... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 29, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > One missing part of that is that we'd need to support bind parameters > for utility statements, eg new password in ALTER USER. That's been > on the wish list for a long time anyway, of course. I think it's > mostly a matter of lack of round tuits, rather than any fundamental > problem. (Parameters in transaction control statements might have > fundamental problems, but we can just dismiss that as out of scope.) I might be wrong, but isn't this, like, a ton of work? All the places in the DDL grammar that currently accept string literals would have to be changed to also allow parameters, and then the downstream code would need to be adjusted to look through those parameters to find the corresponding values. Or is there a less painful approach? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/29/20 15:26, Tom Lane wrote: > all of the simpler cases I can think of: aside from the ALTER USER > PASSWORD case, there's INSERT INTO accounts(..., creditcardnumber, > ...) VALUES(..., $n, ...). Neither one of those have a nearby UDF > to control it with. I was thinking incrementally ... something about UDFs only might be quickish to do as a PoC. And is already useful, because if exposure of a particular thing bothers you enough, you can make a UDF or P to control it with. But ultimately, if ALTER USER PASSWORD has sensitivity of its parameter hardcoded in, and CREATE TABLE ACCOUNTS can declare creditcardnumber SENSITIVE, then maybe those bits go out to the client in the parameter Describe message, and come back in the Bind message, without the user even necessarily thinking about it. Regards, -Chap
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 29, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One missing part of that is that we'd need to support bind parameters >> for utility statements, eg new password in ALTER USER. That's been >> on the wish list for a long time anyway, of course. I think it's >> mostly a matter of lack of round tuits, rather than any fundamental >> problem. (Parameters in transaction control statements might have >> fundamental problems, but we can just dismiss that as out of scope.) > I might be wrong, but isn't this, like, a ton of work? I'm not sure how much work, but yeah, there'd be work to do. I don't think there are all that many places where we really have just a string literal (of course, ALTER USER PASSWORD is a poster child exception). I think a large fraction of the interesting cases are places where there's some amount of expression eval capability already, eg parameters in EXECUTE. Syntactically you can already do execute foo(1, $1 + 2); and the only part of that that doesn't work is passing in a parameter. regards, tom lane