Thread: search_path for PL/pgSQL functions partially cached?
Hello, I'm experiencing some weird issues when running the following code in a psql session: ============ CREATE TABLE "tbl" ("col" NUMERIC(15, 0)); CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN RETURN '2.4'; END; $$; BEGIN; CREATE SCHEMA "myschema"; SET LOCAL search_path TO 'myschema'; CREATE TABLE "tbl" ("col" NUMERIC); CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN RETURN '5.4'; END; $$; CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$ DECLARE "variable" "tbl"."col"%TYPE; BEGIN "variable" := "foo"(); RETURN "variable"; END; $$; COMMIT; SELECT "myschema"."run"(); -- returns '2.4' (when run in the same session) -- reconnect to database here: \c SELECT "myschema"."run"(); -- returns '2' SET search_path TO 'myschema'; SELECT "myschema"."run"(); -- returns '5' -- reconnect to database again: \c SET search_path TO 'myschema'; SELECT "myschema"."run"(); -- returns '5.4' SET search_path TO 'public'; SELECT "myschema"."run"(); -- returns '2.4' again ============ I'm using PostgreSQL verison 16.4. Is this the expected behavior? If yes, where is this documented? If no, what would be the expected behavior? Of course, I could fix this by fully qualifying the table name "tbl" in the function. Nonetheless, I'm not really sure what'sgoing on here. It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a session*and* how it is set during the actual call of the function. So even if there are just two schemas involved, thereare 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems tobe somewhat dangerous. Maybe it is even considered a bug? Or is it documented somewhere? I remember running into some problemslike that in the past already, but unfortunately, I don't remember details. I suppose this is because there is some caching mechanism in place. But apparently it only caches the "tbl"."col"%TYPE andnot the "foo"() function call expression. Can someone explain to me what's going on, and what is the best practice todeal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or isthis something that could be fixed in a future version of PostgreSQL? Many thanks and kind regards, Jan Behrens
On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a session *and* how it is set during the actual call of the function. So even if there are just two schemas involved, there are 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems to be somewhat dangerous. Maybe it is even considered a bug?
It is what it is - and if one is not careful one can end up writing hard-to-understand and possibly buggy code due to the various execution environments and caches involved.
I’ve never really understood why “%TYPE’ exists…
Or is it documented somewhere?
Can someone explain to me what's going on, and what is the best practice to deal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or is this something that could be fixed in a future version of PostgreSQL?
Add qualification or attach a “set search_path” clause to “create function”. Code stored in the server should not rely on the session search_path.
David J.
Hi
pá 27. 12. 2024 v 21:26 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a session *and* how it is set during the actual call of the function. So even if there are just two schemas involved, there are 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems to be somewhat dangerous. Maybe it is even considered a bug?It is what it is - and if one is not careful one can end up writing hard-to-understand and possibly buggy code due to the various execution environments and caches involved.
I think plan cache should be invalidated when search_path is different, but maybe there is some bug - there are some optimizations related to faster execution of simple expressions.
I’ve never really understood why “%TYPE’ exists…
referenced types should increase readability - it ensures type compatibility - minimally on oracle, where the change of schema requires recompilation. In Postgres it is working on 99% - plpgsql functions don't hold dependency on types.
Or is it documented somewhere?Can someone explain to me what's going on, and what is the best practice to deal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or is this something that could be fixed in a future version of PostgreSQL?Add qualification or attach a “set search_path” clause to “create function”. Code stored in the server should not rely on the session search_path.
a lot of functionality in Postgres depends on the search path - and then all should be consistent. Sure, writing procedures that depend on the current search path can be a short way to hell.
I cannot to reproduce it
CREATE OR REPLACE FUNCTION s1.fx1()
RETURNS integer
LANGUAGE plpgsql
AS $function$
begin
return 100;
end
$function$
RETURNS integer
LANGUAGE plpgsql
AS $function$
begin
return 100;
end
$function$
CREATE OR REPLACE FUNCTION s2.fx1()
RETURNS integer
LANGUAGE plpgsql
AS $function$
begin
return 200;
end
$function$
RETURNS integer
LANGUAGE plpgsql
AS $function$
begin
return 200;
end
$function$
CREATE OR REPLACE FUNCTION public.foo()
RETURNS void
LANGUAGE plpgsql
AS $function$
RETURNS void
LANGUAGE plpgsql
AS $function$
declare v int;
begin v := fx1();
raise notice '%', v;
end;
$function$
begin v := fx1();
raise notice '%', v;
end;
$function$
(2024-12-27 21:53:13) postgres=# set search_path to s1;
SET
(2024-12-27 21:53:34) postgres=# select public.foo();
NOTICE: 100
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
(2024-12-27 21:53:44) postgres=# set search_path to s2;
SET
(2024-12-27 21:53:47) postgres=# select public.foo();
NOTICE: 200
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
(2024-12-27 21:53:48) postgres=# set search_path to s1;
SET
(2024-12-27 21:53:51) postgres=# select public.foo();
NOTICE: 100
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
SET
(2024-12-27 21:53:34) postgres=# select public.foo();
NOTICE: 100
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
(2024-12-27 21:53:44) postgres=# set search_path to s2;
SET
(2024-12-27 21:53:47) postgres=# select public.foo();
NOTICE: 200
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
(2024-12-27 21:53:48) postgres=# set search_path to s1;
SET
(2024-12-27 21:53:51) postgres=# select public.foo();
NOTICE: 100
┌─────┐
│ foo │
╞═════╡
│ │
└─────┘
(1 row)
so from my perspective is pg ok, tested on pg16 and pg18
David J.
On 12/27/24 12:26, David G. Johnston wrote: > On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de > <mailto:jbe-mlist@magnetkern.de>> wrote: > > > It seems that it matters *both* how the search_path was set during > the *first* invocation of the function within a session *and* how it > is set during the actual call of the function. So even if there are > just two schemas involved, there are 4 possible outcomes for the > "run" function's result ('2.4', '2', '5', and '5.4'). To me, this > behavior seems to be somewhat dangerous. Maybe it is even considered > a bug? > > > It is what it is - and if one is not careful one can end up writing > hard-to-understand and possibly buggy code due to the various execution > environments and caches involved. > > I’ve never really understood why “%TYPE’ exists… Per: https://www.postgresql.org/docs/current/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE "By using %TYPE you don't need to know the data type of the structure you are referencing, and most importantly, if the data type of the referenced item changes in the future (for instance: you change the type of user_id from integer to real), you might not need to change your function definition. %TYPE is particularly valuable in polymorphic functions, since the data types needed for internal variables can change from one call to the next. Appropriate variables can be created by applying %TYPE to the function's arguments or result placeholders." The second case I can buy, the first I am not so sure of. It seems to me the first case it can be 'solved' by the second case. > > > Or is it documented somewhere? > > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING <https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING> > > Can someone explain to me what's going on, and what is the best > practice to deal with it? Is there a way to avoid fully qualifying > every type and expression? Which parts do I have to qualify or is > this something that could be fixed in a future version of PostgreSQL? > > > Add qualification or attach a “set search_path” clause to “create > function”. Code stored in the server should not rely on the session > search_path. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
"David G. Johnston" <david.g.johnston@gmail.com> writes: > It is what it is - and if one is not careful one can end up writing > hard-to-understand and possibly buggy code due to the various execution > environments and caches involved. Yeah, I don't see this changing. The actual answer is that we have search_path-aware caching of expressions and query plans within a plpgsql function, which is why the call to foo() reacts to the current search path. But the types of plpgsql variables are only looked up on the first use (within a session). Perhaps we ought to work harder on that, but it seems like a lot of overhead to add for something that will benefit next to nobody. > I’ve never really understood why “%TYPE’ exists… Compatibility with Oracle, I imagine. I agree it's a bizarre feature. But you could get the same behavior without %TYPE, just by referencing some other type that has different declarations in different schemas. > Add qualification or attach a “set search_path” clause to “create > function”. Code stored in the server should not rely on the session > search_path. Yeah, adding "set search_path" is recommendable if you don't want to think hard about this stuff. regards, tom lane
pá 27. 12. 2024 v 22:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> It is what it is - and if one is not careful one can end up writing
> hard-to-understand and possibly buggy code due to the various execution
> environments and caches involved.
Yeah, I don't see this changing. The actual answer is that we have
search_path-aware caching of expressions and query plans within a
plpgsql function, which is why the call to foo() reacts to the current
search path. But the types of plpgsql variables are only looked up
on the first use (within a session). Perhaps we ought to work harder
on that, but it seems like a lot of overhead to add for something that
will benefit next to nobody.
> I’ve never really understood why “%TYPE’ exists…
Compatibility with Oracle, I imagine. I agree it's a bizarre feature.
But you could get the same behavior without %TYPE, just by referencing
some other type that has different declarations in different schemas.
This feature is not bizarre - just the implementation in Postgres is not fully complete (and I am not sure if it is fixable). PLpgSQL uses plan cache, but there is nothing similar for types.
It is designed for Oracle where search_path doesn't exist, and where change of schema invalidates code, and requires recompilation. PL/pgSQL and
Postgres are much more dynamic systems than Oracle. Maybe PL/pgSQL functions can holds dependency on types, and when any related custom type
is changed, then the cached function can be invalidated. Unfortunately, the frequent change of search path can kill the performance.
> Add qualification or attach a “set search_path” clause to “create
> function”. Code stored in the server should not rely on the session
> search_path.
Yeah, adding "set search_path" is recommendable if you don't want to
think hard about this stuff.
regards, tom lane
On Fri, 27 Dec 2024 13:26:28 -0700 "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > Or is it documented somewhere? > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING I can't find any notes regarding functions and schemas in that section. > Can someone explain to me what's going on, and what is the best practice to > > deal with it? Is there a way to avoid fully qualifying every type and > > expression? Which parts do I have to qualify or is this something that > > could be fixed in a future version of PostgreSQL? > > > > Add qualification or attach a “set search_path” clause to “create > function”. Code stored in the server should not rely on the session > search_path. > > David J. In my (real world) case, I was unable to use "SET search_path FROM CURRENT" because it isn't possible to use "SET" in procedures that use transactions, due to this documented limitation: "If a SET clause is attached to a procedure, then that procedure cannot execute transaction control statements (for example, COMMIT and ROLLBACK, depending on the language)." https://www.postgresql.org/docs/17/sql-createprocedure.html My procedure looks more or less like this: CREATE PROCEDURE "myfunc"() LANGUAGE plpgsql AS $$ DECLARE "old_search_path" TEXT; -- some more variables BEGIN SELECT current_setting('search_path') INTO "old_search_path"; SET search_path TO 'myschema'; -- some code that uses COMMIT and SET TRANSACTION ISOLATION LEVEL PERFORM set_config('search_path', "old_search_path", FALSE); END; $$; My question is: Am I safe if I use fully-qualified types in the DECLARE section only? Or do I need to provide full qualification also in the code below (after SET search_path TO 'myschema')? And bonus question: Is it documented somewhere? Maybe not many people run into these issues because schemas and functions aren't used as often in combination? Kind Regards Jan Behrens
Hi
Maybe not many people run into these issues because schemas and
functions aren't used as often in combination?
I think schema and functions are common combinations. But when people have objects with the same name, then they are careful to be sure, so objects have really identical structure.
Using different types in these objects is very rare. And because Postgres doesn't support it well, experienced developers don't use it. Similar issues can do some issues after an stored procedures update, because can require session reset. Or when you need it, you can use a much more dynamic type like record.
Kind Regards
Jan Behrens
On Sat, 28 Dec 2024 00:40:09 +0100 Jan Behrens <jbe-mlist@magnetkern.de> wrote: > On Fri, 27 Dec 2024 13:26:28 -0700 > "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > > > Or is it documented somewhere? > > > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING > > I can't find any notes regarding functions and schemas in that section. Actually, I found another note in the documentation. But it doesn't explain things correctly. In the documentation for PostgreSQL 17, section 36.17.6.1. (Security Considerations for Extension Functions) says: "SQL-language and PL-language functions provided by extensions are at risk of search-path-based attacks when they are executed, since parsing of these functions occurs at execution time not creation time." https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-SECURITY So here, the manual explicity states that functions are parsed at execution, not creation time. As seen in my original example in this thread, this isn't (fully) true. Moreover, it isn't true for all SQL-language functions, as can be demonstrated with the following code: ============ CREATE SCHEMA s1; CREATE SCHEMA s2; CREATE VIEW s1.v AS SELECT 'creation' AS col; CREATE VIEW s2.v AS SELECT 'runtime' AS col; SET search_path TO 'public', 's1'; CREATE FUNCTION use_sql_atomic() RETURNS TEXT LANGUAGE sql BEGIN ATOMIC SELECT 'use_sql_atomic = ' || col FROM v; END; CREATE FUNCTION use_sql_string() RETURNS TEXT LANGUAGE sql AS $$ SELECT 'use_sql_string = ' || col FROM v; $$; CREATE FUNCTION use_plpgsql() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN RETURN (SELECT 'use_plpgsql = ' || col FROM v); END; $$; SET search_path TO 'public', 's2'; SELECT use_sql_atomic() AS "output" UNION ALL SELECT use_sql_string() AS "output" UNION ALL SELECT use_plpgsql() AS "output"; ============ This generates the following output: output --------------------------- use_sql_atomic = creation use_sql_string = runtime use_plpgsql = runtime (3 rows) Overall, PostgreSQL doesn't behave consistent, and to me it seems that the documentation isn't describing its behavior correctly either. I understand if fixing this is too much work (even though I would really like to see this fixed). But given that the current behavior is highly surprising and inconsistent - and keeping in mind that this is a subject that may affect security - I think the documentation should reflect the current behavior at least. I thus see this as a documentation issue. Kind regards, Jan Behrens
On 1/1/25 09:55, Jan Behrens wrote: > On Sat, 28 Dec 2024 00:40:09 +0100 > Jan Behrens <jbe-mlist@magnetkern.de> wrote: > >> On Fri, 27 Dec 2024 13:26:28 -0700 >> "David G. Johnston" <david.g.johnston@gmail.com> wrote: >> >>>> Or is it documented somewhere? >>> >>> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING >> >> I can't find any notes regarding functions and schemas in that section. > > Actually, I found another note in the documentation. But it doesn't > explain things correctly. In the documentation for PostgreSQL 17, > section 36.17.6.1. (Security Considerations for Extension Functions) > says: > > "SQL-language and PL-language functions provided by extensions are at > risk of search-path-based attacks when they are executed, since parsing > of these functions occurs at execution time not creation time." > > https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-SECURITY > > So here, the manual explicity states that functions are parsed at > execution, not creation time. As seen in my original example in this > thread, this isn't (fully) true. Moreover, it isn't true for all > SQL-language functions, as can be demonstrated with the following code: > > ============ > > CREATE SCHEMA s1; > CREATE SCHEMA s2; > > CREATE VIEW s1.v AS SELECT 'creation' AS col; > CREATE VIEW s2.v AS SELECT 'runtime' AS col; > > SET search_path TO 'public', 's1'; > > CREATE FUNCTION use_sql_atomic() RETURNS TEXT LANGUAGE sql BEGIN ATOMIC > SELECT 'use_sql_atomic = ' || col FROM v; > END; > > CREATE FUNCTION use_sql_string() RETURNS TEXT LANGUAGE sql AS $$ > SELECT 'use_sql_string = ' || col FROM v; > $$; > > CREATE FUNCTION use_plpgsql() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN > RETURN (SELECT 'use_plpgsql = ' || col FROM v); > END; $$; > > SET search_path TO 'public', 's2'; > > SELECT use_sql_atomic() AS "output" UNION ALL > SELECT use_sql_string() AS "output" UNION ALL > SELECT use_plpgsql() AS "output"; > > ============ > > This generates the following output: > > output > --------------------------- > use_sql_atomic = creation > use_sql_string = runtime > use_plpgsql = runtime > (3 rows) > > Overall, PostgreSQL doesn't behave consistent, and to me it seems that > the documentation isn't describing its behavior correctly either. https://www.postgresql.org/docs/current/sql-createfunction.html "sql_body The body of a LANGUAGE SQL function. This can either be a single statement RETURN expression or a block BEGIN ATOMIC statement; statement; ... statement; END This is similar to writing the text of the function body as a string constant (see definition above), but there are some differences: This form only works for LANGUAGE SQL, the string constant form works for all languages. This form is parsed at function definition time, the string constant form is parsed at execution time; therefore this form cannot support polymorphic argument types and other constructs that are not resolvable at function definition time. This form tracks dependencies between the function and objects used in the function body, so DROP ... CASCADE will work correctly, whereas the form using string literals may leave dangling functions. Finally, this form is more compatible with the SQL standard and other SQL implementations. " > > I understand if fixing this is too much work (even though I would > really like to see this fixed). But given that the current behavior is > highly surprising and inconsistent - and keeping in mind that this is a > subject that may affect security - I think the documentation should > reflect the current behavior at least. I thus see this as a > documentation issue. > > Kind regards, > Jan Behrens > > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> On Fri, 27 Dec 2024 13:26:28 -0700
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> > > Or is it documented somewhere?
> >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
>
> I can't find any notes regarding functions and schemas in that section.
"Because PL/pgSQL saves prepared statements and sometimes execution plans in this way, SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command."
Changing search_path is just one possible way to change out which object a name tries to refer to so it is not called out explicitly.
"SQL-language and PL-language functions provided by extensions are at
risk of search-path-based attacks when they are executed, since parsing
of these functions occurs at execution time not creation time."
Moreover, it isn't true for all
SQL-language functions, as can be demonstrated with the following code:
Yeah, when we added a second method to write an SQL-language function, one that doesn't simply accept a string body, we didn't update that section to point out that is the string input variant of create function that is affected in this manner, the non-string (atomic) variant stores the result of parsing the inline code as opposed to storing the raw text.
David J.
On Wed, 1 Jan 2025 11:19:32 -0700 "David G. Johnston" <david.g.johnston@gmail.com> wrote: > On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote: > > > On Sat, 28 Dec 2024 00:40:09 +0100 > > Jan Behrens <jbe-mlist@magnetkern.de> wrote: > > > > > On Fri, 27 Dec 2024 13:26:28 -0700 > > > "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > > > > > > > Or is it documented somewhere? > > > > > > > > > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING > > > > > > I can't find any notes regarding functions and schemas in that section. > > > > > "Because PL/pgSQL saves prepared statements and sometimes execution plans > in this way, SQL commands that appear directly in a PL/pgSQL function must > refer to the same tables and columns on every execution; that is, you > cannot use a parameter as the name of a table or column in an SQL command." > > Changing search_path is just one possible way to change out which object a > name tries to refer to so it is not called out explicitly. The first part of the cited sentence seems helpful ("you must always refer to the same tables and columns on every execution"). I would thus conclude that using a dynamic search_path when running functions or procedures is *always* considered errorneous (even though not reported by the database as an error), except when using EXECUTE. I wonder if the database could/should generate an error (or at least a warning?) when a function or procedure without a "SET search_path" statement uses a non-qualified name? According to the documentation using a dynamic search_path to refer to different entities in the database is a case that "must" not happen. But following through, this might lead to more warnings one might expect, e.g. when using simple operators such as "=" or the "IN" or "CASE expression WHEN" statements, as these rely on the search_path as well. Should such code be considered non-idiomatic, dangerous, or even errorneous if a "SET search_path" option is missing in the function's/procedure's definition? Maybe I'm overthinking this. But in practice, I've been running into surprising issues whenever functions and schemas are involved, and I'm not sure if every programmer will be aware of how important it is to properly set a search_path in the function's defintion after reading the documentation. (Besides, it's not always possible in procedures.) > > > "SQL-language and PL-language functions provided by extensions are at > > risk of search-path-based attacks when they are executed, since parsing > > of these functions occurs at execution time not creation time." > > > Moreover, it isn't true for all > > SQL-language functions, as can be demonstrated with the following code: > > Yeah, when we added a second method to write an SQL-language function, one > that doesn't simply accept a string body, we didn't update that section to > point out that is the string input variant of create function that is > affected in this manner, the non-string (atomic) variant stores the result > of parsing the inline code as opposed to storing the raw text. > > David J. I missed that other part in the manual (which is in a totally different section). Should I report the missing update in section 36.17.6.1. of the documentation as a documentation issue, or is it not necessary? Kind regards, Jan Behrens
Hi
čt 2. 1. 2025 v 11:37 odesílatel Jan Behrens <jbe-mlist@magnetkern.de> napsal:
On Wed, 1 Jan 2025 11:19:32 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:
> On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
>
> > On Sat, 28 Dec 2024 00:40:09 +0100
> > Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> > >
> > > > > Or is it documented somewhere?
> > > >
> > > >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > >
> > > I can't find any notes regarding functions and schemas in that section.
> >
> >
> "Because PL/pgSQL saves prepared statements and sometimes execution plans
> in this way, SQL commands that appear directly in a PL/pgSQL function must
> refer to the same tables and columns on every execution; that is, you
> cannot use a parameter as the name of a table or column in an SQL command."
>
> Changing search_path is just one possible way to change out which object a
> name tries to refer to so it is not called out explicitly.
The first part of the cited sentence seems helpful ("you must always
refer to the same tables and columns on every execution"). I would thus
conclude that using a dynamic search_path when running functions or
procedures is *always* considered errorneous (even though not reported
by the database as an error), except when using EXECUTE.
I wonder if the database could/should generate an error (or at least a
warning?) when a function or procedure without a "SET search_path"
statement uses a non-qualified name? According to the documentation
using a dynamic search_path to refer to different entities in the
database is a case that "must" not happen.
But following through, this might lead to more warnings one might
expect, e.g. when using simple operators such as "=" or the "IN" or
"CASE expression WHEN" statements, as these rely on the search_path as
well. Should such code be considered non-idiomatic, dangerous, or even
errorneous if a "SET search_path" option is missing in the
function's/procedure's definition?
Maybe I'm overthinking this. But in practice, I've been running into
surprising issues whenever functions and schemas are involved, and I'm
not sure if every programmer will be aware of how important it is to
properly set a search_path in the function's defintion after reading
the documentation. (Besides, it's not always possible in procedures.)
How can you identify unwanted usage of non qualified identifiers from wanted usage of non qualified identifiers? It is a common pattern for sharding. Using not qualified identifiers of operators, functions is common when you are using orafce extensions, etc.
Using qualified identifiers everywhere strongly reduces readability. There are no aliases to the schema, so aliases cannot help.
you can identify the functions where search_path is not explicitly assigned
select oid::regprocedure
from pg_proc
where pronamespace::regnamespace not in ('pg_catalog', 'information_schema')
and not exists(select 1 from unnest(proconfig) g(v) where v ~ '^search_path');
Regards
Pavel
>
> > "SQL-language and PL-language functions provided by extensions are at
> > risk of search-path-based attacks when they are executed, since parsing
> > of these functions occurs at execution time not creation time."
>
> > Moreover, it isn't true for all
> > SQL-language functions, as can be demonstrated with the following code:
>
> Yeah, when we added a second method to write an SQL-language function, one
> that doesn't simply accept a string body, we didn't update that section to
> point out that is the string input variant of create function that is
> affected in this manner, the non-string (atomic) variant stores the result
> of parsing the inline code as opposed to storing the raw text.
>
> David J.
I missed that other part in the manual (which is in a totally different
section). Should I report the missing update in section 36.17.6.1. of
the documentation as a documentation issue, or is it not necessary?
Kind regards,
Jan Behrens
On Thu, 2 Jan 2025 12:40:59 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > How can you identify unwanted usage of non qualified identifiers from > wanted usage of non qualified identifiers? It is a common pattern for > sharding. Using not qualified identifiers of operators, functions is common > when you are using orafce extensions, etc. I don't fully understand the use-case. Could you elaborate? As I understand, even if identifiers are not fully-qualified, it is forbidden to use the search_path to refer to different database entities at run-time (as David pointed out). So I don't understand how a dynamic "search_path" could be used in any scenario within functions except when EXECUTE is involved. > > Using qualified identifiers everywhere strongly reduces readability. There > are no aliases to the schema, so aliases cannot help. Yes, I agree on that. Using "SET search_path" in the function's definition fixes that problem, but it's easy to miss how important this is from reading the documentation: The manual regarding "CREATE FUNCTION" refers to "search_path" only within the "Writing SECURITY DEFINER Functions Safely" section. It's easy to skip that part unless you use that feature. Moreover, that section alone doesn't explain the weird behavior of four different outcomes of a function with only two schemas involved which I brought up in the beginning of this thread. The part on "SET configuration_parameter" part in the "CREATE FUNCTION" documentation doesn't mention the search_path or schemas. And I don't think you can expect every programmer will read the "Plan Caching" subsection in the "PL/pgSQL under the Hood" section. But even then, the information is just provided indirectly. Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't give any hint either. I think (assuming that the behavior isn't fixed) that some slighly more prominent warning would be reasonable. > > you can identify the functions where search_path is not explicitly assigned > > select oid::regprocedure > from pg_proc > where pronamespace::regnamespace not in ('pg_catalog', > 'information_schema') > and not exists(select 1 from unnest(proconfig) g(v) where v ~ > '^search_path'); > > > Regards > > Pavel Kind regards, Jan
čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens <jbe-mlist@magnetkern.de> napsal:
On Thu, 2 Jan 2025 12:40:59 +0100
Pavel Stehule <pavel.stehule@gmail.com> wrote:
> How can you identify unwanted usage of non qualified identifiers from
> wanted usage of non qualified identifiers? It is a common pattern for
> sharding. Using not qualified identifiers of operators, functions is common
> when you are using orafce extensions, etc.
I don't fully understand the use-case. Could you elaborate?
As I understand, even if identifiers are not fully-qualified, it is
forbidden to use the search_path to refer to different database
entities at run-time (as David pointed out).
So I don't understand how a dynamic "search_path" could be used in any
scenario within functions except when EXECUTE is involved.
you don't need more databases
schema one - customer x
schema two - customer y
create table one.t1(..); create table one.t2(..);
create table two.t1(..); create table two.t2(..);
set search_path to one;
-- work with data set of customer x
set search_path to two;
-- work wit data set of customer y
some times can be pretty ineffective to have database per customer - more connect, disconnect in postgres is much more expensive than SET search_path TO .. and maybe RESET plans;
>
> Using qualified identifiers everywhere strongly reduces readability. There
> are no aliases to the schema, so aliases cannot help.
Yes, I agree on that. Using "SET search_path" in the function's
definition fixes that problem, but it's easy to miss how important this
is from reading the documentation:
The manual regarding "CREATE FUNCTION" refers to "search_path" only
within the "Writing SECURITY DEFINER Functions Safely" section. It's
easy to skip that part unless you use that feature. Moreover, that
section alone doesn't explain the weird behavior of four different
outcomes of a function with only two schemas involved which I brought
up in the beginning of this thread.
The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
documentation doesn't mention the search_path or schemas. And I don't
think you can expect every programmer will read the "Plan Caching"
subsection in the "PL/pgSQL under the Hood" section. But even then, the
information is just provided indirectly.
yes, probably nobody reads the plan caching doc. And if they read it, then because they have performance problems.
Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
give any hint either.
This is a question - this is a generic feature in Postgres. Every query can be impacted by setting of search_path.
From my perspective, there can be a note in the documentation related to copy types and row types.
The problem that you found is not just about the change of search_path. Same problem can be found after altering the table.
Regards
Pavel
I think (assuming that the behavior isn't fixed) that some slighly more
prominent warning would be reasonable.
>
> you can identify the functions where search_path is not explicitly assigned
>
> select oid::regprocedure
> from pg_proc
> where pronamespace::regnamespace not in ('pg_catalog',
> 'information_schema')
> and not exists(select 1 from unnest(proconfig) g(v) where v ~
> '^search_path');
>
>
> Regards
>
> Pavel
Kind regards,
Jan
On Thu, 2 Jan 2025 13:48:29 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens <jbe-mlist@magnetkern.de> > napsal: > > > On Thu, 2 Jan 2025 12:40:59 +0100 > > Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > > How can you identify unwanted usage of non qualified identifiers from > > > wanted usage of non qualified identifiers? It is a common pattern for > > > sharding. Using not qualified identifiers of operators, functions is > > common > > > when you are using orafce extensions, etc. > > > > I don't fully understand the use-case. Could you elaborate? > > > > As I understand, even if identifiers are not fully-qualified, it is > > forbidden to use the search_path to refer to different database > > entities at run-time (as David pointed out). > > > > So I don't understand how a dynamic "search_path" could be used in any > > scenario within functions except when EXECUTE is involved. > > > > you don't need more databases > > schema one - customer x > schema two - customer y > > create table one.t1(..); create table one.t2(..); > create table two.t1(..); create table two.t2(..); > > set search_path to one; > -- work with data set of customer x > > set search_path to two; > -- work wit data set of customer y > > some times can be pretty ineffective to have database per customer - more > connect, disconnect in postgres is much more expensive than SET search_path > TO .. and maybe RESET plans; I guess that means there is a practical application where search_path MAY change at runtime IF done in different sessions or if the cache is reset using the DISCARD command: https://www.postgresql.org/docs/17/sql-discard.html I assume DISCARD PLANS would be the right command? This seems to be a very special case though. I think there should be a warning in the documentation of CREATE FUNCTION with regard to schemas anyway, though. Regards, Jan
Hi
>
> some times can be pretty ineffective to have database per customer - more
> connect, disconnect in postgres is much more expensive than SET search_path
> TO .. and maybe RESET plans;
I guess that means there is a practical application where search_path
MAY change at runtime IF done in different sessions or if the cache is
reset using the DISCARD command:
https://www.postgresql.org/docs/17/sql-discard.html
I assume DISCARD PLANS would be the right command?
that depends. plan inside plan cache is invalidated when search_path is different. You use RESET plans because you want to release all plans quickly.
Unfortunately, the types assigned to plpgsql variables are not invalidated. This is the source of problems. It is a classical problem - it is hard to say when you should invalidate cache.
Current design is not ideal - but it is almost a good enough compromise between correctness and performance. It is true, so nobody did some work to fix it. So maybe the impact to performance should not be too bad, but it is not an easy issue. plans are isolated - and the impact of one plan to the second plan is zero. For variables it is exactly opposite.
This seems to be a very special case though. I think there should be a
warning in the documentation of CREATE FUNCTION with regard to schemas
anyway, though.
I am not sure. If you want to use this warning, then it should be everywhere where any non-qualified identifier can be used. Maybe in plpgsql can be more accented so almost everything in plpgsql depends on the current setting of search_path. Lot of people don't understand, so every expression in plpgsql is SQL and every expression is executed like part of a query. And unfortunately there are some different caches - plpgsql cache and plan cache and both caches are invalidated at different times (I think so plpgsql cache is not resetted by RESET PLANS). Maybe it is better to explain how plpgsql works. It is a little bit different from well known interpreted languages.
Regards,
Jan
On Sat, 28 Dec 2024 00:40:09 +0100 Jan Behrens <jbe-mlist@magnetkern.de> wrote: > > Add qualification or attach a “set search_path” clause to “create > > function”. Code stored in the server should not rely on the session > > search_path. > > > > David J. I have been trying to adjust some of my code, and I still have cases where I have to rely on the session's search_path. I'll provide an example below. > > [...] > > My question is: Am I safe if I use fully-qualified types in the DECLARE > section only? Or do I need to provide full qualification also in the > code below (after SET search_path TO 'myschema')? > > And bonus question: Is it documented somewhere? > > [...] > > Kind Regards > Jan Behrens The following code is taken from a project I'm currently working on: ============ -- Let's assume we don't know the name of the schema in which the -- "pgratio" extension with the RATIONAL data type is installed. CREATE SCHEMA "qwertyuiop"; CREATE EXTENSION "pgratio" WITH SCHEMA "qwertyuiop"; -- This installs schema "myschema" with some dynamic function: BEGIN; CREATE SCHEMA "myschema"; SET LOCAL search_path TO "myschema"; -- Append schema of "pgratio" extension, which provides the RATIONAL -- data type, to search_path: SELECT set_config( 'search_path', current_setting('search_path') || ', ' || quote_ident(nspname), TRUE ) FROM pg_namespace, pg_extension WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio'; CREATE DOMAIN "rational_wrapper" AS RATIONAL; CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS RATIONAL -------------------------------------------------------------------- -- I cannot use SET search_path FROM CURRENT here, because "query_p" -- shall refer to tables in the search_path of the caller. -------------------------------------------------------------------- LANGUAGE plpgsql AS $$ DECLARE "old_search_path" TEXT; ---------------------------------------------------------------- -- I have to fully qualify the following type. -- Moreover, I can't use RATIONAL as I don't know its schema. ---------------------------------------------------------------- "result" "myschema"."rational_wrapper"; BEGIN SELECT current_setting('search_path') INTO "old_search_path"; PERFORM set_config( 'search_path', 'myschema, ' || quote_ident(nspname) || ', pg_temp, ' || "old_search_path", TRUE ) FROM pg_namespace, pg_extension WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio'; ---------------------------------------------------------------- -- Is it safe to not fully qualify type RATIONAL below? -- And, if yes, where in the documentation is this explained? ---------------------------------------------------------------- CREATE TEMP TABLE "mytemptable" ("val" RATIONAL); EXECUTE 'INSERT INTO "mytemptable" ' 'SELECT "query"."a" * "query"."b" ' 'FROM (' || "query_p" || ') AS "query"'; -- Do some things here. SELECT sum("val") INTO "result" FROM "mytemptable"; PERFORM set_config('search_path', "old_search_path", TRUE); RETURN "result"; END; $$; COMMIT; CREATE TABLE "tbl" ("foo" INT8, "bar" INT8); INSERT INTO "tbl" VALUES (5, 7), (1, 10); SELECT "myschema"."some_function"( 'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"' ); \c SELECT "myschema"."some_function"( 'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"' ); ============ The code for the pgratio extension that provides the RATIONAL data type is found here: https://www.public-software-group.org/pgratio Running that code on my machine correctly gives: some_function --------------- 45 (1 row) You are now connected to database "jbe" as user "jbe". some_function --------------- 45 (1 row) Because extensions can only be installed in one schema, it may be a bad idea to have a component requiring an extension to be installed in a particular schema (because if different components have different expectations on the schema name, e.g. some might expect "pgratio" to be installed in "public" and others might expect it in "pgratio" or some other schema such as "qwertyuiop", this would lead to an unresolvable conflict). I would like to know if the above example is correct. It seems overall bulky, but I haven't found a better way, assuming that it can be unknown where a particular extension has been installed to. In particular I feel a bit insecure about where I have to fully qualify, and where not. See the comments in the code above. Note that I want the function to accept a query that makes sense in the caller's search_path. Thus using "SET search_path FROM CURRENT" is not an option for me, I believe. Regards, Jan Behrens
On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
I would like to know if the above example is correct. It seems overall
bulky, but I haven't found a better way, assuming that it can be
unknown where a particular extension has been installed to. In
particular I feel a bit insecure about where I have to fully qualify,
and where not. See the comments in the code above.
Short answer, you cannot looking at a definition and know the answer - whether the code is going to be executed in a sanitized search_path is what matters. Anything that would be executed during pg_restore has to be made safe. Therefore, code that is only ever executed by applications directly can use swarch_path.
I’d probably modify the function signature to take search_path as a second optional argument and then invoke a set search_path within the function. At worse the caller can place current_setting(search_path) as the value of that argument though being explicit would be recommended.
David J.
On Fri, 3 Jan 2025 08:34:57 -0700 "David G. Johnston" <david.g.johnston@gmail.com> wrote: > On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote: > > > > I would like to know if the above example is correct. It seems overall > > bulky, but I haven't found a better way, assuming that it can be > > unknown where a particular extension has been installed to. In > > particular I feel a bit insecure about where I have to fully qualify, > > and where not. See the comments in the code above. > > > Short answer, you cannot looking at a definition and know the answer - > whether the code is going to be executed in a sanitized search_path is what > matters. I don't understand. Do you mean my last example is wrong / insecure? If so, why? > Anything that would be executed during pg_restore has to be made > safe. Therefore, code that is only ever executed by applications directly > can use swarch_path. Why should the function be executed during pg_restore? > > I’d probably modify the function signature to take search_path as a second > optional argument and then invoke a set search_path within the function. > At worse the caller can place current_setting(search_path) as the value of > that argument though being explicit would be recommended. > > David J. I could do that, but I would like to understand if that is really necessary as it makes the interface more complicated, and I would like to avoid unnecessary complexity in my interface. Is it really impossible to have functions without SET search_path in the definition of a PL/pgSQL function if I fully-qualify all types in the DECLARE section and if all other non-qualified identifiers occur after set_config('search_path', ...)? Kind regards, Jan Behrens
On Fri, Jan 3, 2025 at 9:48 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
On Fri, 3 Jan 2025 08:34:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:
> On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > I would like to know if the above example is correct. It seems overall
> > bulky, but I haven't found a better way, assuming that it can be
> > unknown where a particular extension has been installed to. In
> > particular I feel a bit insecure about where I have to fully qualify,
> > and where not. See the comments in the code above.
>
>
> Short answer, you cannot looking at a definition and know the answer -
> whether the code is going to be executed in a sanitized search_path is what
> matters.
I don't understand. Do you mean my last example is wrong / insecure?
If so, why?
It is at risk because it depends on the session search_path. That is all. Whether that risk turns into a failure to execute depends on how/when it is executed. I'm not that comfortable talking about security risks in this context though the current design goal is to mitigate such security issues by setting things up so the function execution fails rather than is executed insecurely. This is presently mainly done by setting the search_path to just effectively pg_catalog before executing the query, breaking any code depending on other schemas existing in the search_path.
> Anything that would be executed during pg_restore has to be made
> safe. Therefore, code that is only ever executed by applications directly
> can use swarch_path.
Why should the function be executed during pg_restore?
If the function is used in building an index, or a materialized view, are the common cases. Trigger functions too.
Note, this is talking about evaluating functions generally, not the one provided here specifically.
I could do that, but I would like to understand if that is really
necessary as it makes the interface more complicated, and I would like
to avoid unnecessary complexity in my interface.
Is it really impossible to have functions without SET search_path in
the definition of a PL/pgSQL function if I fully-qualify all types in
the DECLARE section and if all other non-qualified identifiers occur
after set_config('search_path', ...)?
If you add a set_config to the body of the function then you indeed avoid the problem. It is basically equivalent to adding a SET clause to the create function command. In this case even when the function is executed in a sanitized search_path environment (such as the one established by pg_restore) you are not relying on it. That non-reliance is all that really matters.
David J.
On Fri, 3 Jan 2025 10:16:15 -0700 "David G. Johnston" <david.g.johnston@gmail.com> wrote: > It is at risk because it depends on the session search_path. That is all. > Whether that risk turns into a failure to execute depends on how/when it is > executed. I'm not that comfortable talking about security risks in this > context though the current design goal is to mitigate such security issues > by setting things up so the function execution fails rather than is > executed insecurely. This is presently mainly done by setting the > search_path to just effectively pg_catalog before executing the query, > breaking any code depending on other schemas existing in the search_path. I'm not sure if there is a misunderstanding. In my last example (e-mail dated Fri, 3 Jan 2025 13:53:32 +0100), the user who has control over the contents of the "query_p" argument is an application programmer, not a real end-user. The function is also *not* marked as SECURITY DEFINER, so it always runs with the privileges of the caller. I don't see any specific security risk here, except that I'm unsure if the function is written properly with regard to qualification of the used types after PL/pgSQL's BEGIN. As I learned, I must fully-qualify types *before* the BEGIN, i.e. in the DECLARE section. But does this also hold for types after the BEGIN when I previously ensure that the search_path is correctly set (set within the function's body)? > > > > Anything that would be executed during pg_restore has to be made > > > safe. Therefore, code that is only ever executed by applications > > directly > > > can use swarch_path. > > > > Why should the function be executed during pg_restore? > > > If the function is used in building an index, or a materialized view, are > the common cases. Trigger functions too. > > Note, this is talking about evaluating functions generally, not the one > provided here specifically. I don't think my function would be evaluated during a pg_restore then. > > > I could do that, but I would like to understand if that is really > > necessary as it makes the interface more complicated, and I would like > > to avoid unnecessary complexity in my interface. > > > > Is it really impossible to have functions without SET search_path in > > the definition of a PL/pgSQL function if I fully-qualify all types in > > the DECLARE section and if all other non-qualified identifiers occur > > after set_config('search_path', ...)? > > > If you add a set_config to the body of the function then you indeed avoid > the problem. It is basically equivalent to adding a SET clause to the > create function command. In this case even when the function is executed > in a sanitized search_path environment (such as the one established by > pg_restore) you are not relying on it. That non-reliance is all that > really matters. > > David J. But if I use "SET search_path FROM CURRENT", then the called function won't know the search_path that is set at the caller's side (which is what I need to make my interface nice to use). I would prefer my current solution, but I would like to be sure that my example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is correct. I still am not sure about that. Kind Regards, Jan Behrens
On 1/3/25 13:33, Jan Behrens wrote: > On Fri, 3 Jan 2025 10:16:15 -0700 > "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > > But if I use "SET search_path FROM CURRENT", then the called function > won't know the search_path that is set at the caller's side (which is > what I need to make my interface nice to use). At this point I am lost as to what the overall goal of this is. Can you provide a 10000 ft view if what it is you are trying to achieve? > > I would prefer my current solution, but I would like to be sure that my > example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is > correct. I still am not sure about that. > > Kind Regards, > Jan Behrens > > -- Adrian Klaver adrian.klaver@aklaver.com
On Fri, Jan 3, 2025 at 2:33 PM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
I would prefer my current solution, but I would like to be sure that my
example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is
correct. I still am not sure about that.
If it does what you want and it is only ever executed by application code over a client connection you should be fine. Your client connection will always have whatever search_path you arrange to have in place and the application developer will know whether their environment is correct or not when they test it, and have the ability to change their environment as needed.
David J.
On Fri, 3 Jan 2025 13:56:02 -0800 Adrian Klaver <adrian.klaver@aklaver.com> wrote: > At this point I am lost as to what the overall goal of this is. > > Can you provide a 10000 ft view if what it is you are trying to achieve? Sure! I would like to create a component (e.g. a PostgreSQL extension) that provides a function which processes some complex data, without making any requirements regarding where the data is stored. To pass this data to the function, I could use arrays of composite types, but that seems to be very bulky. Another option would be to use cursors, but that didn't turn out to work very smooth either. Instead, I plan to expect the function to receive a query string that will get the data that is being processed by the function. That query string should be allowed to refer to tables in the search_path at the caller's side. Therefore, I cannot use the "SET search_path FROM CURRENT" in my "CREATE FUNCTION" statement, because it would overwrite the current search_path on each call of the function. Thus my idea is to do this (simplified): CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type" -------------------------------------------------------------------- -- I cannot use SET search_path FROM CURRENT here, because "query_p" -- shall refer to tables in the search_path of the caller. -------------------------------------------------------------------- LANGUAGE plpgsql AS $$ DECLARE "old_search_path" TEXT; ---------------------------------------------------------------- -- I have to fully qualify types in the DECLARE section. ---------------------------------------------------------------- "some_variable" "some_schema"."some_type"; BEGIN SELECT current_setting('search_path') INTO "old_search_path"; PERFORM set_config( 'search_path', 'some_schema, pg_temp, ' || "old_search_path", TRUE ); ---------------------------------------------------------------- -- Do I have to fully qualify types and operators from -- "myschema" here? Or is it safe to not fully qualify them? ---------------------------------------------------------------- END; $$; That is my overall idea. My problem is that I'm confused about WHEN EXACTLY I have to qualify tables/types, etc. It is very hard to understand from reading (just) the documentation. Kind Regards, Jan Behrens
On Sat, 4 Jan 2025 00:22:03 +0100 Jan Behrens <jbe-mlist@magnetkern.de> wrote: > ---------------------------------------------------------------- > -- Do I have to fully qualify types and operators from > -- "myschema" here? Or is it safe to not fully qualify them? > ---------------------------------------------------------------- This was meant to read: ...from "some_schema" here.
On Fri, 3 Jan 2025 at 18:22, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
Instead, I plan to expect the function to receive a query string that
will get the data that is being processed by the function.
That query string should be allowed to refer to tables in the
search_path at the caller's side.
Therefore, I cannot use the "SET search_path FROM CURRENT" in my
"CREATE FUNCTION" statement, because it would overwrite the current
search_path on each call of the function.
I wonder if it would help if EXECUTE took an optional search_path to use while executing the query.
On Fri, 3 Jan 2025 18:36:13 -0500 Isaac Morland <isaac.morland@gmail.com> wrote: > On Fri, 3 Jan 2025 at 18:22, Jan Behrens <jbe-mlist@magnetkern.de> wrote: > > > > Instead, I plan to expect the function to receive a query string that > > will get the data that is being processed by the function. > > > > That query string should be allowed to refer to tables in the > > search_path at the caller's side. > > > > Therefore, I cannot use the "SET search_path FROM CURRENT" in my > > "CREATE FUNCTION" statement, because it would overwrite the current > > search_path on each call of the function. > > > > I wonder if it would help if EXECUTE took an optional search_path to use > while executing the query. That wouldn't solve my problem, because the function that includes the EXECUTE still needs to know the search_path set on the caller side. This only works if I omit the "SET search_path FROM CURRENT" option in the function's definition OR if I pass a search_path as an argument. I guess I could write a wrapper: ============ BEGIN; CREATE SCHEMA "some_schema"; SET LOCAL search_path TO "some_schema"; CREATE TYPE "some_type" AS ("name" TEXT, "height" INT8); CREATE FUNCTION "foo_impl"("query_p" TEXT, "search_path_p" TEXT) RETURNS "some_type" LANGUAGE plpgsql SET search_path FROM CURRENT AS $$ DECLARE "old_search_path" TEXT; "result" "some_type"; BEGIN "old_search_path" = current_setting('search_path'); PERFORM set_config('search_path', "search_path_p", TRUE); EXECUTE "query_p" INTO "result"; PERFORM set_config('search_path', "old_search_path", TRUE); RETURN "result"; END; $$; CREATE FUNCTION "foo"("query_p" TEXT) RETURNS "some_type" RETURN "foo_impl"("query_p", current_setting('search_path')); COMMIT; CREATE TABLE "tbl" ("id" SERIAL8, "name" TEXT, "height" INT8); INSERT INTO "tbl" ("name", "height") VALUES ('Alice', 200); SELECT * FROM "some_schema"."foo"('SELECT "name" FROM "tbl"'); ============ Not sure which variant (this or my previous attempt) is better and if either is safe/correct. Regards, Jan Behrens
On 1/3/25 15:22, Jan Behrens wrote: > On Fri, 3 Jan 2025 13:56:02 -0800 > Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> At this point I am lost as to what the overall goal of this is. >> >> Can you provide a 10000 ft view if what it is you are trying to achieve? > > Sure! I would like to create a component (e.g. a PostgreSQL extension) > that provides a function which processes some complex data, without > making any requirements regarding where the data is stored. To pass > this data to the function, I could use arrays of composite types, but > that seems to be very bulky. Another option would be to use cursors, > but that didn't turn out to work very smooth either. > > Instead, I plan to expect the function to receive a query string that > will get the data that is being processed by the function. > > That query string should be allowed to refer to tables in the > search_path at the caller's side. > > Therefore, I cannot use the "SET search_path FROM CURRENT" in my > "CREATE FUNCTION" statement, because it would overwrite the current > search_path on each call of the function. > > Thus my idea is to do this (simplified): > > CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type" > > -------------------------------------------------------------------- > -- I cannot use SET search_path FROM CURRENT here, because "query_p" > -- shall refer to tables in the search_path of the caller. > -------------------------------------------------------------------- > > LANGUAGE plpgsql AS $$ > DECLARE > "old_search_path" TEXT; > > ---------------------------------------------------------------- > -- I have to fully qualify types in the DECLARE section. > ---------------------------------------------------------------- > > "some_variable" "some_schema"."some_type"; > BEGIN > SELECT current_setting('search_path') INTO "old_search_path"; > PERFORM set_config( > 'search_path', > 'some_schema, pg_temp, ' || "old_search_path", > TRUE > ); > > ---------------------------------------------------------------- > -- Do I have to fully qualify types and operators from > -- "myschema" here? Or is it safe to not fully qualify them? > ---------------------------------------------------------------- > END; > $$; > > That is my overall idea. Is 'some_schema' a known item when installing? Once you have the search_path defined and assuming all the objects you want are in that path, then yes you can drop the schema qualification. > > My problem is that I'm confused about WHEN EXACTLY I have to qualify > tables/types, etc. It is very hard to understand from reading (just) the > documentation. If you are doing this as an extension then I suspect you want the processes shown here: https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-RELOCATION > > Kind Regards, > Jan Behrens -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, 4 Jan 2025 09:37:14 -0800 Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 1/3/25 15:22, Jan Behrens wrote: > > On Fri, 3 Jan 2025 13:56:02 -0800 > > Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > > >> At this point I am lost as to what the overall goal of this is. > >> > >> Can you provide a 10000 ft view if what it is you are trying to achieve? > > > > Sure! I would like to create a component (e.g. a PostgreSQL extension) > > that provides a function which processes some complex data, without > > making any requirements regarding where the data is stored. To pass > > this data to the function, I could use arrays of composite types, but > > that seems to be very bulky. Another option would be to use cursors, > > but that didn't turn out to work very smooth either. > > > > Instead, I plan to expect the function to receive a query string that > > will get the data that is being processed by the function. > > > > That query string should be allowed to refer to tables in the > > search_path at the caller's side. > > > > Therefore, I cannot use the "SET search_path FROM CURRENT" in my > > "CREATE FUNCTION" statement, because it would overwrite the current > > search_path on each call of the function. > > > > Thus my idea is to do this (simplified): > > > > CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type" > > > > -------------------------------------------------------------------- > > -- I cannot use SET search_path FROM CURRENT here, because "query_p" > > -- shall refer to tables in the search_path of the caller. > > -------------------------------------------------------------------- > > > > LANGUAGE plpgsql AS $$ > > DECLARE > > "old_search_path" TEXT; > > > > ---------------------------------------------------------------- > > -- I have to fully qualify types in the DECLARE section. > > ---------------------------------------------------------------- > > > > "some_variable" "some_schema"."some_type"; > > BEGIN > > SELECT current_setting('search_path') INTO "old_search_path"; > > PERFORM set_config( > > 'search_path', > > 'some_schema, pg_temp, ' || "old_search_path", > > TRUE > > ); > > > > ---------------------------------------------------------------- > > -- Do I have to fully qualify types and operators from > > -- "myschema" here? Or is it safe to not fully qualify them? (correction: "some_schema") > > ---------------------------------------------------------------- > > END; > > $$; > > > > That is my overall idea. > > Is 'some_schema' a known item when installing? Yes, fortunately "some_schema" is a fixed name. > > Once you have the search_path defined and assuming all the objects you > want are in that path, then yes you can drop the schema qualification. That would be nice, but it doesn't seem to be the case. At least not always. I constructed the following new example: ============ CREATE TABLE "tbl" ("col" NUMERIC(15, 0)); CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN RETURN '2.4'; END; $$; BEGIN; CREATE SCHEMA "myschema"; SET LOCAL search_path TO 'myschema'; CREATE TABLE "tbl" ("col" NUMERIC); CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN RETURN '5.4'; END; $$; CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$ DECLARE "old_search_path" TEXT; BEGIN "old_search_path" := current_setting('search_path'); SET LOCAL search_path TO "myschema"; -- At this point, search_path is always set to 'myschema'! DECLARE "variable" "tbl"."col"%TYPE; BEGIN "variable" := "foo"(); RETURN "variable"; END; PERFORM set_config('search_path', "old_search_path", TRUE); END; $$; COMMIT; SELECT "myschema"."run"(); -- returns '5.4' (when run in the same session) -- reconnect to database here: \c SELECT "myschema"."run"(); -- returns '5' SET search_path TO 'myschema'; SELECT "myschema"."run"(); -- returns '5' -- reconnect to database again: \c SET search_path TO 'myschema'; SELECT "myschema"."run"(); -- returns '5.4' SET search_path TO 'public'; SELECT "myschema"."run"(); -- returns '5.4' ============ Even if DECLARE "variable" "tbl"."col"%TYPE; follows *after* the schema is set to "myschema" in the example above, I still get differing results, depending on how the search_path was set when the function was first called. I think this has to do with the fact that the overall structure and probably types(?) are parsed first? As Tom Lane wrote on Fri, 27 Dec 2024 16:03:17 -0500, "the types of plpgsql variables are only looked up on the first use (within a session)." Does this apply to *all* types (e.g. types used in type-casts in statements after BEGIN)? Or does it only apply to types in the DECLARE section? Maybe my most recent example is somewhat "crafted", but it makes me feel insecure about what I can rely on. Could someone explain to me what the exact rules are, or where to find them? I don't seem to understand the exact behavior from reading the docs. I re-read section 41.11.2. on Plan Caching: "The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree the first time the function is called (within each session). The instruction tree fully translates the PL/pgSQL statement structure, but individual SQL expressions and SQL commands used in the function are not translated immediately. As each expression and SQL command is first executed in the function, the PL/pgSQL interpreter parses and analyzes the command to create a prepared statement, using the SPI manager's SPI_prepare function. Subsequent visits to that expression or command reuse the prepared statement. [...]" It isn't specific about how DECLARE blocks are handled. > > > > > My problem is that I'm confused about WHEN EXACTLY I have to qualify > > tables/types, etc. It is very hard to understand from reading (just) the > > documentation. > > If you are doing this as an extension then I suspect you want the > processes shown here: > > https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-RELOCATION Yes, I'm aware of that, but the code defining the function is not part of an extension in my case (only the RATIONAL type is part of an extension). But thank you for pointing this out. I have been using @extschema@ in extension code before, and recently also learned about the @extschema:name@ syntax. > -- > Adrian Klaver > adrian.klaver@aklaver.com Many thanks already for all your input. Regards, Jan Behrens
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
Even if
DECLARE "variable" "tbl"."col"%TYPE;
follows *after* the schema is set to "myschema" in the example above, I
still get differing results, depending on how the search_path was set
when the function was first called.
I think this has to do with the fact that the overall structure and
probably types(?) are parsed first?
I concur that this dynamic doesn’t seem to be discussed. Namely that in the presence of nested blocks the parse phase resolves placeholders for all declared variables without executing any expressions in the body of the function; therefore all types will be resolved seeing the same search_path, namely that of the calling session or established using SET. Changing the search_path within an outer function body block will not affect declarations within an inner block. (I am not sure whether the for-loop cases are exceptional in this.)
David J.
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
CREATE FUNCTION "foo_impl"("query_p" TEXT, "search_path_p" TEXT)
RETURNS "some_type"
LANGUAGE plpgsql SET search_path FROM CURRENT AS $$
DECLARE
"old_search_path" TEXT;
"result" "some_type";
BEGIN
"old_search_path" = current_setting('search_path');
PERFORM set_config('search_path', "search_path_p", TRUE);
EXECUTE "query_p" INTO "result";
PERFORM set_config('search_path', "old_search_path", TRUE);
RETURN "result";
END;
$$;
You might consider adding a polymorphic argument for the result type. Then if you call the function with two different typed inputs it will be cached once for each.
“ Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.”
David J.
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
I re-read section 41.11.2. on Plan Caching:
"The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure,
The type of a plpgsql variable is by definition its structure; established in a statement, so this is actually covered by that paragraph. But I would be for adding a bit more specific terminology here.
David J.
On Sun, 2025-01-05 at 00:12 +0100, Jan Behrens wrote: > I constructed the following new example: > > ============ > > CREATE TABLE "tbl" ("col" NUMERIC(15, 0)); > > CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ > BEGIN > RETURN '2.4'; > END; > $$; > > BEGIN; > > CREATE SCHEMA "myschema"; > SET LOCAL search_path TO 'myschema'; > > CREATE TABLE "tbl" ("col" NUMERIC); > > CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$ > BEGIN > RETURN '5.4'; > END; > $$; > > CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$ > DECLARE > "old_search_path" TEXT; > BEGIN > "old_search_path" := current_setting('search_path'); > SET LOCAL search_path TO "myschema"; > -- At this point, search_path is always set to 'myschema'! > DECLARE > "variable" "tbl"."col"%TYPE; > BEGIN > "variable" := "foo"(); > RETURN "variable"; > END; > PERFORM set_config('search_path', "old_search_path", TRUE); > END; > $$; > > COMMIT; > > Even if > > DECLARE "variable" "tbl"."col"%TYPE; > > follows *after* the schema is set to "myschema" in the example above, I > still get differing results, depending on how the search_path was set > when the function was first called. So what you should do is set the "search_path" *on* the function, not *in* the function: CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql SET search_path = myschema AS $$ DECLARE "variable" "tbl"."col"%TYPE; BEGIN "variable" := "foo"(); RETURN "variable"; END; $$; Yours, Laurenz Albe
On Sun, 05 Jan 2025 07:48:56 +0100 Laurenz Albe <laurenz.albe@cybertec.at> wrote: > So what you should do is set the "search_path" *on* the function, not *in* > the function: > > CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql > SET search_path = myschema > AS $$ > DECLARE > "variable" "tbl"."col"%TYPE; > BEGIN > "variable" := "foo"(); > RETURN "variable"; > END; > $$; > > Yours, > Laurenz Albe Yes, that's what works and what I would also do whenever possible (probably in the form "SET search_path FROM CURRENT"). Summarizing the remaining thread, some issues are: * The documentation isn't providing a prominent warning that behavior can be surprising if "SET search_path" is not used in the function's or procedure's defintion. (E.g. searching for "schema" in the documentation page for "CREATE FUNCTION" doesn't give any helpful hints or warning.) * Things get more complicated when it's impossible to use "SET search_path" in the function's/procedure's definition, for which there are two scenarios: Scenario 1: The function or procedure needs or wants to access or use the search_path of the caller. Scenario 2: A procedure wants to execute transactional statements such as COMMIT or ROLLBACK within its body. In scenario 1, using "SET search_path" will overwrite the caller's search_path at runtime, so I cannot access it. (In my post from Sat, 4 Jan 2025 14:23:10 +0100, I have proposed a wrapper function to work around that.) In scenario 2, using "SET search_path" is simply not possible and will be rejected by PostgreSQL. * It is a bit unclear how the exact behavior is when I set a search_path from within the functions body (e.g. due to one of the two scenarios above). There are some examples that show some quite surprising behavior, at least if you don't fully understand the plan caching mechanism that is used. Kind regards, Jan Behrens