Re: security_definer_search_path GUC - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: security_definer_search_path GUC
Date
Msg-id CAFj8pRA9_seOtHQfCQncPrjz2OBUyK6u4-u5=0=96p4s4U-28A@mail.gmail.com
Whole thread Raw
In response to Re: security_definer_search_path GUC  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers


st 2. 6. 2021 v 14:46 odesílatel Joel Jacobson <joel@compiler.org> napsal:
On Wed, Jun 2, 2021, at 09:07, Pavel Stehule wrote:
st 2. 6. 2021 v 8:45 odesílatel Joel Jacobson <joel@compiler.org> napsal:
'search_path' is a bit like a global variable in C, that can change the behaviour of the SQL commands executed.
It makes unqualified SQL code context-sensitive; you don't know by looking at a piece of code what objects are referred to, you also need to figure out what the active search_path is at this place in the code.

sometimes this is wanted feature - some sharding is based on this

set search_path = 'custormerx'

Oh, interesting, didn't know abou that one. Is that recommended best practise, or more of a hack?

I have not any statistics, but I think it was relatively common until we had good partitioning.  I know two big customers from Czech Republic.

Some people use schema as a database - without overhead of system catalogue and without necessity of reconnects to other databases.

Using search_path is very common for applications ported from Oracle.



I also think we can never get rid of search_path by default, since so much legacy depend on it.
But I think it would be good to provide a way to effectively uninstall the search_path for users who prefer to do so, in databases where it's possible, and where clarity and safety is desired.


'public' schema if used (without ever changing the default 'search_path'), allows creating unqualified database objects, which I think can be useful in at least three situations:

1) when the application is a monolith inside a company, when there is only one version of the database, i.e. not having to worry about name collision with other objects in some other version, since the application is hidden in the company and the schema design is not exposed to the public

2) when installing a extension that uses schemas, when wanting the convenience of unqualified access to some functions frequently used, instead of adding its schema to the search_path for convenience, one can instead add wrapper-functions in the 'public' schema. This way, all internal functions in the extension, that are not meant to be executed by users, are still hidden in its schema and won't bother anyone (i.e. can't cause unexpected conflicts). Of course, access can also be controlled via REVOKE EXECUTE ... FROM PUBLIC for such internal functions, which is probably a good idea as well.
In a similar way, specific tables in the extension's schema can be made unqualified as well by adding simple views, installed in the public schema, if insisting on unqualified convenience.

In conclusion:
The main difference is 'public' makes it possible to make *specific* objects unqualified,
while 'search_path' makes *all* objects in such schema(s) unqualified.

These arguments are valid, but I think so it is not all. If you remove search_path, then the "public" schema will be overused.

What makes you think that? If a database object is to be accessed unqualified by all users, isn't the 'public' schema a perfect fit for it? How will it be helpful to create different database objects in different schemas, if also adding all such schemas to the search_path so they can be accessed unqualified? In such a scenario you risk unintentionally creating conflicting objects, and whatever schema happened to be first in the search_path will be resolved. Seems insecure and messy to me.
Much safer to install objects that you want to access unqualified in 'public', and get an error if you try to create a new object with a conflicting name of an existing one.

I think people usually prefer simple solutions -  like use for all public or use for all schemas.



I think we should ask - who can change the search path and how. Now, there are not any limits. I can imagine the situation when search_path can be changed by only some dedicated role - it can be implemented in a security definer function. Or another solution, we can fix the search path to one value, or only a few possibilities.
 
Maybe for your purpose is just enough to introduce syntax for defining all possibilities of search path:

search_path = "public" # now, just default
search_path = ["public"] # future - define vector of possible values of search path - in this case, only "public" is allowed - and if you want to change it, you should be database owner

or there can be hook for changing search_path, and it can be implemented dynamically in extension

Not bad ideas. I think they would improve the situation. Maybe it could even be a global immutable constant value, the same for all users, that could only be set upon initdb, similar to how encoding can only be set via initdb.

initdb --search_path "pg_catalog, public, pg_temp" foobar

But perhaps the search_path as an uninstallable extension is a less invasive idea.

Looking at the code, this seems to be the commit that introduced search_path back in 2002:

I'm not sure how difficult it would be to extract search_path into an extension.
Doesn't look to be that much code. Here is the initial commit that introduced the concept.
But perhaps it's more complex today due to new dependencies.

commit 838fe25a9532ab2e9b9b7517fec94e804cf3da81
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Apr 1 03:34:27 2002 +0000

    Create a new GUC variable search_path to control the namespace search
    path.  The default behavior if no per-user schemas are created is that
    all users share a 'public' namespace, thus providing behavior backwards
    compatible with 7.2 and earlier releases.  Probably the semantics and
    default setting will need to be fine-tuned, but this is a start.

But search_path is not the only problem. I think it's also a problem objects with the same identifies can be created in both pg_catalog and public. Can we think of a valid reason why it is a good idea to continue to allow that? In what real-life scenario is it needed?

Probably it has not sense, but there is simple implementation - you can use just unique index(schema name, object name), and you don't need any other locks and checks

Pavel



/Joel




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Next
From: Julien Rouhaud
Date:
Subject: Re: security_definer_search_path GUC