Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION } - Mailing list pgsql-hackers

From Robert Haas
Subject Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
Date
Msg-id CA+TgmoZGu46an-Vkrbe2o0i2wOpjUFWX93fnVtQ2j8FT32gDQw@mail.gmail.com
Whole thread Raw
In response to Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
List pgsql-hackers
On Tue, Sep 19, 2023 at 7:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Good documentation includes some guidance. Sure, it should describe the
> system behavior, but without anchoring it to some kind of expected use
> case, it can be equally frustrating.

Fair.

> I don't mean to set some major new standard in the documentation that
> should apply to everything; but for the privilege system, even hackers
> are having trouble keeping up (myself included). A bit of guidance
> toward supported use cases helps a lot.

Yeah, this stuff is complicated, and I agree that it's hard even for
hackers to keep up with. I don't really have a strong view on the
concrete case you mentioned involving password_required. I always
worry that if there are three cases and we suggest one of them then
the others will be viewed negatively when really they're all equally
fine. On the other hand, that can often be addressed by starting the
sentence with "For example, you could...." or similar, so perhaps
there's no problem here at all. I generally agree with the idea that
examples can be useful for clarifying points that may otherwise be too
theoretical.

> I hope what I'm saying is not useless vitriol. I am offering the best
> solutions I see in a bad situation. And I believe I've uncovered some
> emergent behaviors that are not well-understood even among prominent
> hackers.

Yeah, I wasn't really intending to say that you were. I just get
nervous about statements like "don't ever do X!" because I find it
completely unconvincing. In my experience, when you tell people stuff
like that, some of them just go off and do it anyway and, especially
in a case like this, chances are very good that nothing bad will ever
happen to them, simply because most PostgreSQL installations don't
have malicious local users. When you tell them "but that's really bad"
they say "why?" and if the documentation doesn't have an answer to the
question well then that sucks.

> >  because it would break a lot of stuff for a lot
> > of people, but if we could get that then I think we could maybe
> > emerge
> > in a better spot once the pain of the compatibility break receded.
>
> Are there ways we can soften this a bit? I know compatibility GUCs are
> not to be added lightly, but perhaps one is justified here?

I don't know. I'm skeptical. This behavior is so complicated and hard
to get right. Having it GUC-dependent makes it even more confusing
than it is already. But I guess it also depends on what the GUC does.

Let's say we make a rule that every function or procedure has to have
a search_path attached to it as a function property. That is, CREATE
FUNCTION .. SEARCH something sets pg_proc.prosearch = 'something'. If
you omit the SEARCH clause, one is implicitly supplied for you. If you
say SEARCH NULL, then the function is executed with the search_path
taken from the GUC; SEARCH 'anything_else' specified a literal
search_path to be used.

In such a world, I can imagine having a GUC that determines whether
the implicitly supplied SEARCH clause is SEARCH
${WHATEVER_THE_SEARCH_PATH_IS_RIGHT_NOW} or SEARCH NULL. Such a GUC
only takes effect at CREATE FUNCTION time. However, I cannot imagine
having a GUC that causes the SEARCH clauses attached to all functions
to be globally ignored at execution time. That seems like a choice we
would likely regret bitterly. The first thing is already painful, but
the second one is exponentially worse, because in the first world, you
have to be careful to get your functions defined correctly, but if you
do, you know they'll run OK on any PostgreSQL cluster anywhere,
whereas in the second world, there's no way to define a function that
behaves the same way on every PostgreSQL instance. Imagine being an
extension author, for example.

I am a little worried that this kind of design might end up reversing
the direction of some security problems that we have now. For
instance, right now, if you call a function with a SET search_path
clause, you know that it can't make any changes to search_path that
survive function exit. You'll get your old search_path back. With this
kind of design, it seems like it would be a lot easier to get back to
the SQL toplevel and find the search_path surprisingly changed under
you. I think we have that problem in some cases already, though. I'm
unclear how much worse this makes it.

> > Another option is something around sandboxing and/or function trust.
> > The idea here is to not care too much about the search_path behavior
> > itself, and instead focus on the consequences, namely what code is
> > getting executed as which user and perhaps what kinds of operations
> > it's performing.
>
> I'm open to discussing that further, and it certainly may solve some
> problems, but it does not seem to solve the fundamental problem with
> search_path: that the caller can (intentionally or unintentionally)
> cause a function to do unexpected things.

Well, I think it's meant to solve that problem.  How effectively it
does so is a point worth debating.

> Sometimes an unexpected thing is not a the kind of thing that would be
> caught by a sandbox, e.g. just an unexpected function result. But if
> that function is used in a constraint or expression index, that
> unexpected result can lead to a violated constraint or a bad index
> (that will later cause wrong results). The owner of the table might
> reasonably consider that a privilege problem, if the user who causes
> the trouble had only INSERT privileges.

That's an interesting example. Earlier versions of the function trust
proposal proposed to block *any* execution of code belonging to an
untrusted party. That could potentially block this attack. However, it
would also block a lot of other things. For instance, if Alice tries
to insert into Bob's table and Bob's table has a CHECK constraint or
an index expression, Alice has to trust Bob or she can't insert
anything at all. By trusting Bob just enough to allow him do things
like CHECK(LENGTH(foo) < 10) or whatever, Alice can operate on Bob's
table without a problem in normal cases, but is still protected if Bob
suddenly starts doing something sneaky. I think that's a significant
improvement, because a system that is so stringent that it blocks even
completely harmless things is likely to get disabled, at which point
it protects nobody from anything.

However, that analysis presumes that what we're trying to do is
protect Alice from Bob, and I think you're raising the question of how
we protect Bob from Alice. Suppose Bob has got a trigger function but
has failed to control search_path for that function. Alice can set
search_path so that Bob's trigger calls some function or operator that
she owns instead of the intended call to, say, a system function or
operator. Some sufficiently-rigid function trust system could catch
this: Bob doesn't trust Alice, and so the fact that his code is trying
to call some a function or operator owned by Alice is a red flag. On
the basis of the fact that Bob doesn't trust Alice, we should error
out to protect Bob. Had the search_path been set in the expected way,
Bob would have been trying to call a superuser-owned function, and Bob
must trust the superuser, so the operation is permitted.

I wouldn't have a problem with a function-trust proposal that
incorporated a mode that rigid as a configuration option. I find this
a convincing example of how that could be useful. But such a mode has
pretty serious downsides, too. It makes it very difficult for one user
to interact with another user's objects in any way without triggering
security errors.

Also, in a case like this, I don't think it's unreasonable to ask
whether, perhaps, Bob just needs to be a little more careful about
setting search_path. I think that there is a big difference between
(a) defining a SQL-language function that is accessible to multiple
users and (b) inserting a row into a table you don't own. When you
define a function, you know people are potentially going to call it.
Asking you, as the function author, to take some care to secure your
function against a malicious search_path doesn't seem like an
unsupportable burden. After all, you control the definition of that
function. The problem with inserting a row into a table you don't own
is that all of the objects involved -- the table itself, its indexes,
its triggers, its defaults, its constraints -- are owned by somebody
else, and that user controls those objects and can change any of them
at any time. You can't really be expected to verify that all code
reachable as a result of an INSERT into the table is safe enough
before every INSERT into that table. You can, I think, be expected to
check that functions you define have SET search_path attached.

> > Are there any other categories of things we can do? More specific
> > kinds of things we can do in each category? I don't really see an
> > option other than (1) "change something in the system design so that
> > people use search_path wrongly less often" or (2) "make it so that it
> > doesn't matter as much if people using the wrong search_path" but
> > maybe I'm missing a clever idea.
>
> Perhaps there are some clever ideas about maintaining compatibility
> within the approaches (1) or (2), which might make one of them more
> appealing.

Indeed.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: how to do profile for pg?
Next
From: Robert Haas
Date:
Subject: Re: Questions about the new subscription parameter: password_required