Thread: pie-in-sky idea: 'sensitive' function parameters

pie-in-sky idea: 'sensitive' function parameters

From
Chapman Flack
Date:
CREATE FUNCTION commence_primary_ignition(target text, password text)
RETURNS void AS ...;

SELECT commence_primary_ignition(target=>'Alderaan', password=>'1234');

/* 1234 could appear in logs, activity stats ... disturbing */



CREATE OR REPLACE FUNCTION commence_primary_ignition(
    target text,
    password text SENSITIVE) RETURNS void AS ...;

SELECT commence_primary_ignition(target=>'Alderaan', password=>'1234');

/* logs, stats show something like ... password=>[redacted]); */


I had this idea while thinking about how to send a query with
some sensitive data in a way that would protect the data from
log exposure, and realizing I couldn't think of any. Docs for
log_statement even promise that extended query protocol Bind
parameters do get included. (But docs also suggested that some
minimal parsing occurs before log_statement happens, so there
could be an opportunity to detect that a sensitive parameter
is being mentioned and the extents of the assigned expression.)

I also appreciate that one doesn't want to empower a user to
conceal queries at will (thus track_activities and log_statement
are SUSET), which led me to the idea of a function parameter
declaration, putting the function definer in control of what
bits should get redacted.

In full generality, I'm sure this would present lots of
implementation challenges. But various restrictions might
possibly simplify it....

- recognize only in calls using named notation for the
  sensitive parameter(s)? (nothing to check for function
  calls using only positional notation)

- introduce a special named-notation variant?
  target=>'Alderaan', password=!>'1234'
  (no function lookup needed during parse, can identify what
  to redact immediately, and defer a simple check that 'password'
  really is declared sensitive to some later stage when function is
  looked up)

- recognize only in extended-protocol queries where the parameter
  is supplied by a Bind? (know what to redact without having to
  parse arbitrarily hairy expressions)


Would anyone else see some value in this capability? Could it
(or some suitable restriction of it) seem implementable, or would
the complications be overwhelming?

-Chap


Re: pie-in-sky idea: 'sensitive' function parameters

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> ... which led me to the idea of a function parameter
> declaration, putting the function definer in control of what
> bits should get redacted.

+1 for thinking outside the box, but ...

> Would anyone else see some value in this capability? Could it
> (or some suitable restriction of it) seem implementable, or would
> the complications be overwhelming?

... the problem with this idea is that knowledge that the item ought to be
hidden would be obtained only very late in the parsing process.  So for
example if you fat-fingered something just to the left of the function
call in the query text, or the name of the function itself, your password
would still get exposed in the log.

This indeed is the core problem with every proposal I've seen for
semantics-based log filtering.  Error logging needs to be considered
as a very low-level operation, because reports may come out when
little if anything is known about the real semantics of the query.

            regards, tom lane


Re: pie-in-sky idea: 'sensitive' function parameters

From
Craig Ringer
Date:
On 3 February 2018 at 11:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chapman Flack <chap@anastigmatix.net> writes:
> ... which led me to the idea of a function parameter
> declaration, putting the function definer in control of what
> bits should get redacted.

+1 for thinking outside the box, but ...

> Would anyone else see some value in this capability? Could it
> (or some suitable restriction of it) seem implementable, or would
> the complications be overwhelming?

... the problem with this idea is that knowledge that the item ought to be
hidden would be obtained only very late in the parsing process.  So for
example if you fat-fingered something just to the left of the function
call in the query text, or the name of the function itself, your password
would still get exposed in the log.

This indeed is the core problem with every proposal I've seen for
semantics-based log filtering.  Error logging needs to be considered
as a very low-level operation, because reports may come out when
little if anything is known about the real semantics of the query.

 
About the only time I think it's really viable to pursue is if it's restricted to bind parameters. We only log those later and more selectively as it is, so it seems much more reasonable to say "I never want <parameter X> to appear in the logs".

That said, I'm not sure it can be done at the function-interface level, or if it'd have to be done in the Bind message to make it reliable and available early enough.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pie-in-sky idea: 'sensitive' function parameters

From
Chapman Flack
Date:
On 02/02/18 22:46, Tom Lane wrote:

> ... the problem with this idea is that knowledge that the item ought to be
> hidden would be obtained only very late in the parsing process.  So for
> example if you fat-fingered something just to the left of the function
> call in the query text, or the name of the function itself, your password
> would still get exposed in the log.

That didn't seem like a deal-breaker to me, as the common case I had
imagined for it would be where the query is coded into something (a webapp,
say) that just has some bits of sensitive data to pass in, and would surely
just be tested on dummy data until it was clear the canned query was at
least free of fat-finger issues.

Indeed, should the feature have to be restricted for practicality's sake
to only working with bound parameters and certain protocol variants, it
might not be easy or even possible to use it improvisationally from psql,
but that might be an acceptable limitation.

On 02/03/18 02:14, Craig Ringer wrote:

> About the only time I think it's really viable to pursue is if it's
> restricted to bind parameters. We only log those later and more
> selectively as it is, so it seems much more reasonable to say "I never
> want <parameter X> to appear in the logs".

And I think that could be an acceptable restriction. One way could use
a simple flag to accompany the parameter binding from the client, which
would be recognized early enough to keep the parameter out of the logs,
but also checked later when the function info is available, throwing an
error if it was used for a parameter not declared sensitive, or not used
for a parameter that is.

Or it could even have to be used in a prepared statement. A little
clunkier and one more round trip to the server, but probably not onerous,
and by the time PREPARE completes, surely it can be known which parameter
slots are declared that way.

Still wouldn't account for internal statements logged or errors thrown
by whatever the function does, but maybe the existing machinery for
declaring functions leakproof is at least a start on that?

-Chap


Re: pie-in-sky idea: 'sensitive' function parameters

From
Chapman Flack
Date:
Just giving this thread a bump in honor of the mention of sensitive
things in logs in the cryptography unconference session.

I'm not partisan about any of the particular syntax examples I gave
earlier, but it seems like there were two key ingredients:

1.  In what is sent from the client to the server, certain parameters
    are marked as sensitive in ways that are obvious early at parse time,
    before having to look up or analyze anything.

2.  But those markings are later compared to the actual declarations of
    things, once those are resolved. It is an error either to send as
    'sensitive' a parameter that isn't so declared, or to forget to send
    as 'sensitive' one that is.

The first error is to prevent evildoers using the facility to hide values
from the logs arbitrarily in the queries they are sending.

The second error is to catch mistakes during app development. It's possible
that a query sent by an app under development won't have the right things
marked 'sensitive' yet, and it's possible those queries get exposed in the
logs because the early parse-time indication that they shouldn't be is
missing.

But at that stage of development, the values being sent shouldn't really
be sensitive ones, and making such a query an error ensures such omissions
are caught and fixed.

Regards,
-Chap



Re: pie-in-sky idea: 'sensitive' function parameters

From
Robert Haas
Date:
On Fri, May 29, 2020 at 12:38 PM Chapman Flack <chap@anastigmatix.net> wrote:
> Just giving this thread a bump in honor of the mention of sensitive
> things in logs in the cryptography unconference session.

I agree that there's a problem here and I would like to see a solution
to it, too. Several EnterpriseDB customers have complained about the
fact that ALTER USER .. PASSWORD logs the password, which is a similar
kind of problem, though it involves DDL rather than DML.

But, I don't really see a way forward that doesn't involve breaking
stuff that works today. We have options to log queries before parsing
them. You can't redact sensitive details at that point because you
don't know whether the query contains any such details, or where they
are located. You can't postpone logging the query without changing the
behavior of the system in user-visible ways. What if parsing gets
stuck in an infinite loop and you want to look at the log to see which
query got stuck? What if you want to look at the timestamps on the log
messages to know when the server started dealing with some particular
query?

On a related note, what if parsing fails? Not only should the logging
that happens before parsing already have been done, but the parse
error itself should also log the query that triggered it, so that the
user has some chance of troubleshooting the situation. However, since
we haven't completed parsing, there's no way to redact what gets
logged in such a case; we can't know where the sensitive stuff is.
Now, you might say that you only care about redacting queries that are
well-formed, but parsing can fail for other reasons - e.g. because you
ran out of memory. You could argue that such cases are unlikely enough
that you don't care about them, but that doesn't mean somebody else
won't file a CVE.

Also, even if you are logging after parsing has been successfully
completed, does that mean that you know which parts of the query
contain sensitive information? Parse trees contain some positional
information, but I'm not sure that there's enough information there,
or that we have it in all the cases somebody might want to redact.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pie-in-sky idea: 'sensitive' function parameters

From
Chapman Flack
Date:
On 05/29/20 14:51, Robert Haas wrote:
> stuff that works today. We have options to log queries before parsing
> them. You can't redact sensitive details at that point because you
> don't know whether the query contains any such details, or where they
> are located. You can't postpone logging the query without changing the
> behavior of the system in user-visible ways. What if parsing gets
> stuck in an infinite loop and you want to look at the log to see which
> query got stuck? What if you want to look at the timestamps on the log

A possible step in the direction of good-enough would be to have
'sensitive' flags only in the parameter-bind message of the extended
protocol.

The query you send down has only placeholders. The sensitive stuff
comes in the bind message. Recognizing a 'sensitive' bit for a parameter
in a bind message ought to be entirely feasible. If parsing the query
failed and got logged because of the failure, the log record just
shows the placeholders.

Again, to protect from the other kind of misuse (a malicious query
hiding values from the log), there should be a check after parsing/
analysis that the values sent with 'sensitive' flags in the bind
message are exactly the ones that were so declared.

A stored function/procedure might have a way to declare some of its
parameters that way, with some default role controlling permission
to declare such a function.

Maybe you could even declare such parameters directly in the syntax
of preparing a query (with, again, control over who gets to prepare
such a query). There again, the correspondence of those declarations
to the bind message can't be verified until after parsing/analysis,
but meanwhile the sensitive values are clearly flagged in the bind
message, and any logging on a parse failure is still able to avoid
logging them.

Regards,
-Chap



Re: pie-in-sky idea: 'sensitive' function parameters

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> A possible step in the direction of good-enough would be to have
> 'sensitive' flags only in the parameter-bind message of the extended
> protocol.

Yeah, if we restrict our promises to being willing to not log (some)
bind parameters, that seems far more manageable, and reliable, than
figuring out what parts of query strings need to be hidden.

One missing part of that is that we'd need to support bind parameters
for utility statements, eg new password in ALTER USER.  That's been
on the wish list for a long time anyway, of course.  I think it's
mostly a matter of lack of round tuits, rather than any fundamental
problem.  (Parameters in transaction control statements might have
fundamental problems, but we can just dismiss that as out of scope.)

I don't have a lot of faith (not any, really) in your notion of tying
control of the feature to function arguments.  That falls down for
all of the simpler cases I can think of: aside from the ALTER USER
PASSWORD case, there's INSERT INTO accounts(..., creditcardnumber,
...) VALUES(..., $n, ...).  Neither one of those have a nearby UDF
to control it with.  Also, what would we do with something like
sensitive_function($1 || 'constant') ?

            regards, tom lane



Re: pie-in-sky idea: 'sensitive' function parameters

From
Robert Haas
Date:
On Fri, May 29, 2020 at 3:05 PM Chapman Flack <chap@anastigmatix.net> wrote:
> A possible step in the direction of good-enough would be to have
> 'sensitive' flags only in the parameter-bind message of the extended
> protocol.

Interesting idea. Changing the wire protocol for this sort of thing
makes it a much bigger lift, but it might help, at least in some
cases. It does however require that the user being using prepared
queries, and that the user know which data is sensitive. For instance,
if a user is sitting there in a psql shell and the requirement is that
if they happen to type ALTER USER .. PASSWORD the new password doesn't
get logged, this approach fails both because the user doesn't do
anything to identify the query as special, and also because it's not
prepared. So in a certain sense you could say that with this design
the server just passes the buck: it's too hard for us to figure out
which things to log, so we're making it your job to tell us....

Another point to consider is that I think we already have the ability
to suppress logging of bind parameters. So, people who want this sort
of thing and are able to use bind parameters can just not log them and
then all is well.That does have the deficiency that we then log NO
bind parameters, and somebody might want to log them in some
situations but not others, but perhaps there is a simpler way of
accomplishing that than what you've outlined here? I don't know. I'm
just throwing out ideas...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pie-in-sky idea: 'sensitive' function parameters

From
Robert Haas
Date:
On Fri, May 29, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One missing part of that is that we'd need to support bind parameters
> for utility statements, eg new password in ALTER USER.  That's been
> on the wish list for a long time anyway, of course.  I think it's
> mostly a matter of lack of round tuits, rather than any fundamental
> problem.  (Parameters in transaction control statements might have
> fundamental problems, but we can just dismiss that as out of scope.)

I might be wrong, but isn't this, like, a ton of work? All the places
in the DDL grammar that currently accept string literals would have to
be changed to also allow parameters, and then the downstream code
would need to be adjusted to look through those parameters to find the
corresponding values. Or is there a less painful approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pie-in-sky idea: 'sensitive' function parameters

From
Chapman Flack
Date:
On 05/29/20 15:26, Tom Lane wrote:

> all of the simpler cases I can think of: aside from the ALTER USER
> PASSWORD case, there's INSERT INTO accounts(..., creditcardnumber,
> ...) VALUES(..., $n, ...).  Neither one of those have a nearby UDF
> to control it with.

I was thinking incrementally ... something about UDFs only might
be quickish to do as a PoC. And is already useful, because if exposure
of a particular thing bothers you enough, you can make a UDF or P to
control it with.

But ultimately, if ALTER USER PASSWORD has sensitivity of
its parameter hardcoded in, and CREATE TABLE ACCOUNTS can declare
creditcardnumber SENSITIVE, then maybe those bits go out to the client
in the parameter Describe message, and come back in the Bind message,
without the user even necessarily thinking about it.

Regards,
-Chap



Re: pie-in-sky idea: 'sensitive' function parameters

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 29, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One missing part of that is that we'd need to support bind parameters
>> for utility statements, eg new password in ALTER USER.  That's been
>> on the wish list for a long time anyway, of course.  I think it's
>> mostly a matter of lack of round tuits, rather than any fundamental
>> problem.  (Parameters in transaction control statements might have
>> fundamental problems, but we can just dismiss that as out of scope.)

> I might be wrong, but isn't this, like, a ton of work?

I'm not sure how much work, but yeah, there'd be work to do.

I don't think there are all that many places where we really have
just a string literal (of course, ALTER USER PASSWORD is a poster
child exception).  I think a large fraction of the interesting cases are
places where there's some amount of expression eval capability already,
eg parameters in EXECUTE.  Syntactically you can already do

execute foo(1, $1 + 2);

and the only part of that that doesn't work is passing in a parameter.

            regards, tom lane