Re: sandboxing untrusted code - Mailing list pgsql-hackers

From Robert Haas
Subject Re: sandboxing untrusted code
Date
Msg-id CA+TgmoboxfxK=XSAqXNtZBSusJFOyE0uYZeg0R-pi57BVg--Ww@mail.gmail.com
Whole thread Raw
In response to Re: sandboxing untrusted code  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: sandboxing untrusted code
List pgsql-hackers
On Thu, Aug 31, 2023 at 8:57 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > As a refresher, the scenario I'm talking about is any one in which
> > one
> > user, who I'll call Bob, does something that results in executing
> > code
> > provided by another user, who I'll call Alice. The most obvious way
> > that this can happen is if Bob performs some operation that targets a
> > table owned by Alice. That operation might be DML, like an INSERT or
> > UPDATE; or it might be some other kind of maintenance command that
> > can
> > cause code execution, like REINDEX, which can evaluate index
> > expressions.
>
> REINDEX executes index expressions as the table owner. (You are correct
> that INSERT executes index expressions as the inserting user.)

I was speaking here of who provided the code, rather than whose
credentials were used to execute it. The index expressions are
provided by the table owner no matter who evaluates them in a
particular case.

> > 1. Compute stuff. There's no restriction on the permissible amount of
> > compute; if you call untrusted code, nothing prevents it from running
> > forever.
> > 2. Call other code. This may be done by a function call or a command
> > such as CALL or DO, all subject to the usual permissions checks but
> > no
> > further restrictions.
> > 3. Access the current session state, without modifying it. For
> > example, executing SHOW or current_setting() is fine.
> > 4. Transiently modify the current session state in ways that are
> > necessarily reversed before returning to the caller. For example, an
> > EXCEPTION block or a configuration change driven by proconfig is
> > fine.
> > 5. Produce messages at any log level. This includes any kind of
> > ERROR.
>
> Nothing in that list really exercises privileges (except #2?). If those
> are the allowed set of things a sandboxed function can do, is a
> sandboxed function equivalent to a function running with no privileges
> at all?

Close but not quite. As you say, #2 does exercise privileges. Also,
even if no privileges are exercised, you could still refer to
CURRENT_ROLE, and I think you could also call a function like
has_table_privilege.  Your identity hasn't changed, but you're
restricted from exercising some of your privileges. Really, you still
have them, but they're just not available to you in that situation.

> Please explain #2 in a bit more detail. Whose EXECUTE privileges would
> be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the
> called code also be sandboxed?

Nothing in this proposed system has any impact on whose privileges are
used in any particular context, so any privilege checks conducted
pursuant to #2 are performed as the same user who would perform them
today. Whether the called code would be sandboxed depends on how the
rules I articulated in the previous email would apply to it. Since
those rules depend on the user IDs, if the called code is owned by the
same user as the calling code and is SECURITY INVOKER, then those
rules apply in the same way and the same level of sandboxing will
apply. But if the called function is owned by a different user or is
SECURITY DEFINER, then the rules might apply differently to the called
code than the calling code. It's possible this isn't quite good enough
and that some adjustments to the rules are necessary; I'm not sure.

> Let me qualify that: if the function is written by Alice, and the code
> is able to really exercise the privileges of the caller (Bob), then it
> seems really hard to make it safe for the caller.
>
> If the function is sandboxed such that it's not really using Bob's
> privileges (it's just nominally running as Bob) that's a much more
> tractable problem.

Agreed.

> One complaint (not an objection, because I don't think we have
> the luxury of objecting to viable proposals when it comes to improving
> our security model):
>
> Although your proposal sounds like a good security backstop, it feels
> like it's missing the point that there are different _kinds_ of
> functions. We already have the IMMUTABLE marker and we already have
> runtime checks to make sure that immutable functions can't CREATE
> TABLE; why not build on that mechanism or create new markers?

I haven't ruled that out completely, but there's some subtlety here
that doesn't exist in those other cases. If the owner of a function
marks it wrongly in terms of volatility or parallel safety, then they
might make queries run more slowly than they should, or they might
make queries return wrong answers, or error out, or even end up with
messed-up indexes. But none of that threatens the stability of the
system in any very deep way, or the security of the system. It's no
different than putting a CHECK (false) constraint on a table, or
something like that: it might make the system not work, and if that
happens, then you can fix it. Here, however, we can't trust the owners
of functions to label those functions accurately. It won't do for
Alice to create a function and then apply the NICE_AND_SAFE marker to
it. That defeats the whole point. We need to know the real behavior of
Alice's function, not the behavior that Alice says it has.

Now, in the case of a C function, things are a bit different. We can't
inspect the generated machine code and know what the function does,
because of that pesky halting problem. We could handle that either
through function labeling, since only superusers can create C
functions, or by putting checks directly in the C code. I was somewhat
inclined toward the latter approach, but I'm not completely sure yet
what makes sense. Thinking about your comments here made me realize
that there are other procedural languages to worry about, too, like
PL/python or PL/perl or PL/sh. Whatever we do for the C functions will
have to be extended to those cases somehow as well. If we label
functions, then we'll have to allow superusers only to label functions
in these languages as well and make the default label "this is
unsafe." If we put checks in the C code then I guess any given PL
needs to certify that it knows about sandboxing or have all of its
functions treated as unsafe. I think doing this at the C level would
be better, strictly speaking, because it's more granular. Imagine a
function that only conditionally does some prohibited action - it can
be allowed to work in the cases where it does not attempt the
prohibited operation, and blocked when it does. Labeling is
all-or-nothing.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Next
From: Amit Kapila
Date:
Subject: Re: Fix a typo in decode.c