Thread: search_path for PL/pgSQL functions partially cached?

search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
Hello,

I'm experiencing some weird issues when running the following code in a psql session:

============

CREATE TABLE "tbl" ("col" NUMERIC(15, 0));

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
    RETURN '2.4';
  END;
$$;

BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO 'myschema';

CREATE TABLE "tbl" ("col" NUMERIC);

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
    RETURN '5.4';
  END;
$$;

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
  DECLARE
    "variable" "tbl"."col"%TYPE;
  BEGIN
    "variable" := "foo"();
    RETURN "variable";
  END;
$$;

COMMIT;

SELECT "myschema"."run"(); -- returns '2.4' (when run in the same session)

-- reconnect to database here:
\c

SELECT "myschema"."run"(); -- returns '2'
SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5'

-- reconnect to database again:
\c

SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5.4'
SET search_path TO 'public';
SELECT "myschema"."run"(); -- returns '2.4' again

============

I'm using PostgreSQL verison 16.4.

Is this the expected behavior? If yes, where is this documented? If no, what would be the expected behavior?

Of course, I could fix this by fully qualifying the table name "tbl" in the function. Nonetheless, I'm not really sure
what'sgoing on here.
 

It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a
session*and* how it is set during the actual call of the function. So even if there are just two schemas involved,
thereare 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems
tobe somewhat dangerous. Maybe it is even considered a bug? Or is it documented somewhere? I remember running into some
problemslike that in the past already, but unfortunately, I don't remember details.
 

I suppose this is because there is some caching mechanism in place. But apparently it only caches the "tbl"."col"%TYPE
andnot the "foo"() function call expression. Can someone explain to me what's going on, and what is the best practice
todeal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or
isthis something that could be fixed in a future version of PostgreSQL?
 

Many thanks and kind regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de> wrote:

It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a session *and* how it is set during the actual call of the function. So even if there are just two schemas involved, there are 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems to be somewhat dangerous. Maybe it is even considered a bug?

It is what it is - and if one is not careful one can end up writing hard-to-understand and possibly buggy code due to the various execution environments and caches involved.

I’ve never really understood why “%TYPE’ exists…


Or is it documented somewhere? 


Can someone explain to me what's going on, and what is the best practice to deal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or is this something that could be fixed in a future version of PostgreSQL?

Add qualification or attach a “set search_path” clause to “create function”.  Code stored in the server should not rely on the session search_path.

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:
Hi

pá 27. 12. 2024 v 21:26 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de> wrote:

It seems that it matters *both* how the search_path was set during the *first* invocation of the function within a session *and* how it is set during the actual call of the function. So even if there are just two schemas involved, there are 4 possible outcomes for the "run" function's result ('2.4', '2', '5', and '5.4'). To me, this behavior seems to be somewhat dangerous. Maybe it is even considered a bug?

It is what it is - and if one is not careful one can end up writing hard-to-understand and possibly buggy code due to the various execution environments and caches involved.

I think plan cache should be invalidated when search_path is different, but maybe there is some bug - there are some optimizations related to faster execution of simple expressions.


I’ve never really understood why “%TYPE’ exists…

referenced types should increase readability - it ensures type compatibility - minimally on oracle, where the change of schema requires recompilation. In Postgres it is working on 99% - plpgsql functions don't hold dependency on types.



Or is it documented somewhere? 


Can someone explain to me what's going on, and what is the best practice to deal with it? Is there a way to avoid fully qualifying every type and expression? Which parts do I have to qualify or is this something that could be fixed in a future version of PostgreSQL?

Add qualification or attach a “set search_path” clause to “create function”.  Code stored in the server should not rely on the session search_path.

a lot of functionality in Postgres depends on the search path - and then all should be consistent. Sure, writing procedures that depend on the current search path can be a short way to hell.

I cannot to reproduce it

CREATE OR REPLACE FUNCTION s1.fx1()
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
begin
  return 100;
end
$function$

CREATE OR REPLACE FUNCTION s2.fx1()
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
begin
  return 200;
end
$function$
 
CREATE OR REPLACE FUNCTION public.foo()
 RETURNS void
 LANGUAGE plpgsql
AS $function$ 
declare v int;
begin v := fx1();
  raise notice '%', v;
end;
$function$

(2024-12-27 21:53:13) postgres=# set search_path to s1;
SET
(2024-12-27 21:53:34) postgres=# select public.foo();
NOTICE:  100
┌─────┐
│ foo │
╞═════╡
│     │
└─────┘
(1 row)

(2024-12-27 21:53:44) postgres=# set search_path to s2;
SET
(2024-12-27 21:53:47) postgres=# select public.foo();
NOTICE:  200
┌─────┐
│ foo │
╞═════╡
│     │
└─────┘
(1 row)

(2024-12-27 21:53:48) postgres=# set search_path to s1;
SET
(2024-12-27 21:53:51) postgres=# select public.foo();
NOTICE:  100
┌─────┐
│ foo │
╞═════╡
│     │
└─────┘
(1 row)

so from my perspective is pg ok, tested on pg16 and pg18






David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Adrian Klaver
Date:
On 12/27/24 12:26, David G. Johnston wrote:
> On Friday, December 27, 2024, Jan Behrens <jbe-mlist@magnetkern.de 
> <mailto:jbe-mlist@magnetkern.de>> wrote:
> 
> 
>     It seems that it matters *both* how the search_path was set during
>     the *first* invocation of the function within a session *and* how it
>     is set during the actual call of the function. So even if there are
>     just two schemas involved, there are 4 possible outcomes for the
>     "run" function's result ('2.4', '2', '5', and '5.4'). To me, this
>     behavior seems to be somewhat dangerous. Maybe it is even considered
>     a bug?
> 
> 
> It is what it is - and if one is not careful one can end up writing 
> hard-to-understand and possibly buggy code due to the various execution 
> environments and caches involved.
> 
> I’ve never really understood why “%TYPE’ exists…

Per:

https://www.postgresql.org/docs/current/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE

"By using %TYPE you don't need to know the data type of the structure 
you are referencing, and most importantly, if the data type of the 
referenced item changes in the future (for instance: you change the type 
of user_id from integer to real), you might not need to change your 
function definition.

%TYPE is particularly valuable in polymorphic functions, since the data 
types needed for internal variables can change from one call to the 
next. Appropriate variables can be created by applying %TYPE to the 
function's arguments or result placeholders."

The second case I can buy, the first I am not so sure of. It seems to me 
the first case it can be 'solved' by the second case.


> 
> 
>     Or is it documented somewhere? 
> 
> 
> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
<https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING>
> 
>     Can someone explain to me what's going on, and what is the best
>     practice to deal with it? Is there a way to avoid fully qualifying
>     every type and expression? Which parts do I have to qualify or is
>     this something that could be fixed in a future version of PostgreSQL?
> 
> 
> Add qualification or attach a “set search_path” clause to “create 
> function”.  Code stored in the server should not rely on the session 
> search_path.
> 
> David J.
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: search_path for PL/pgSQL functions partially cached?

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> It is what it is - and if one is not careful one can end up writing
> hard-to-understand and possibly buggy code due to the various execution
> environments and caches involved.

Yeah, I don't see this changing.  The actual answer is that we have
search_path-aware caching of expressions and query plans within a
plpgsql function, which is why the call to foo() reacts to the current
search path.  But the types of plpgsql variables are only looked up
on the first use (within a session).  Perhaps we ought to work harder
on that, but it seems like a lot of overhead to add for something that
will benefit next to nobody.

> I’ve never really understood why “%TYPE’ exists…

Compatibility with Oracle, I imagine.  I agree it's a bizarre feature.
But you could get the same behavior without %TYPE, just by referencing
some other type that has different declarations in different schemas.

> Add qualification or attach a “set search_path” clause to “create
> function”.  Code stored in the server should not rely on the session
> search_path.

Yeah, adding "set search_path" is recommendable if you don't want to
think hard about this stuff.

            regards, tom lane



Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:


pá 27. 12. 2024 v 22:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> It is what it is - and if one is not careful one can end up writing
> hard-to-understand and possibly buggy code due to the various execution
> environments and caches involved.

Yeah, I don't see this changing.  The actual answer is that we have
search_path-aware caching of expressions and query plans within a
plpgsql function, which is why the call to foo() reacts to the current
search path.  But the types of plpgsql variables are only looked up
on the first use (within a session).  Perhaps we ought to work harder
on that, but it seems like a lot of overhead to add for something that
will benefit next to nobody.

> I’ve never really understood why “%TYPE’ exists…

Compatibility with Oracle, I imagine.  I agree it's a bizarre feature.
But you could get the same behavior without %TYPE, just by referencing
some other type that has different declarations in different schemas.

This feature is not bizarre - just the implementation in Postgres is not fully complete (and I am not sure if it is fixable). PLpgSQL uses plan cache, but there is nothing similar for types.
It is designed for Oracle where search_path doesn't exist, and where change of schema invalidates code, and requires recompilation. PL/pgSQL and
Postgres are much more dynamic systems than Oracle. Maybe PL/pgSQL functions can holds dependency on types, and when any related custom type
is changed, then the cached function can be invalidated. Unfortunately, the frequent change of search path can kill the performance.


> Add qualification or attach a “set search_path” clause to “create
> function”.  Code stored in the server should not rely on the session
> search_path.

Yeah, adding "set search_path" is recommendable if you don't want to
think hard about this stuff.

                        regards, tom lane


Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Fri, 27 Dec 2024 13:26:28 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> > Or is it documented somewhere?
> 
> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING

I can't find any notes regarding functions and schemas in that section.

> Can someone explain to me what's going on, and what is the best practice to
> > deal with it? Is there a way to avoid fully qualifying every type and
> > expression? Which parts do I have to qualify or is this something that
> > could be fixed in a future version of PostgreSQL?
> >
> 
> Add qualification or attach a “set search_path” clause to “create
> function”.  Code stored in the server should not rely on the session
> search_path.
> 
> David J.

In my (real world) case, I was unable to use "SET search_path FROM
CURRENT" because it isn't possible to use "SET" in procedures that use
transactions, due to this documented limitation:

"If a SET clause is attached to a procedure, then that procedure cannot
execute transaction control statements (for example, COMMIT and
ROLLBACK, depending on the language)."

https://www.postgresql.org/docs/17/sql-createprocedure.html

My procedure looks more or less like this:

CREATE PROCEDURE "myfunc"()
  LANGUAGE plpgsql AS
  $$
  DECLARE
    "old_search_path" TEXT;
    -- some more variables
  BEGIN
    SELECT current_setting('search_path') INTO "old_search_path";
    SET search_path TO 'myschema';
    -- some code that uses COMMIT and SET TRANSACTION ISOLATION LEVEL
    PERFORM set_config('search_path', "old_search_path", FALSE);
  END;
  $$;

My question is: Am I safe if I use fully-qualified types in the DECLARE
section only? Or do I need to provide full qualification also in the
code below (after SET search_path TO 'myschema')?

And bonus question: Is it documented somewhere?

Maybe not many people run into these issues because schemas and
functions aren't used as often in combination?

Kind Regards
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:
Hi


Maybe not many people run into these issues because schemas and
functions aren't used as often in combination?

I think schema and functions are common combinations. But when people have objects with the same name, then they are careful to be sure, so objects have really identical structure.
Using different types in these objects is very rare. And because Postgres doesn't support it well, experienced developers don't use it. Similar issues can do some issues after an stored procedures update, because can require session reset. Or when you need it, you can use a much more dynamic type like record.


Kind Regards
Jan Behrens


Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens <jbe-mlist@magnetkern.de> wrote:

> On Fri, 27 Dec 2024 13:26:28 -0700
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> 
> > > Or is it documented somewhere?
> > 
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> 
> I can't find any notes regarding functions and schemas in that section.

Actually, I found another note in the documentation. But it doesn't
explain things correctly. In the documentation for PostgreSQL 17,
section 36.17.6.1. (Security Considerations for Extension Functions)
says:

"SQL-language and PL-language functions provided by extensions are at
risk of search-path-based attacks when they are executed, since parsing
of these functions occurs at execution time not creation time."

https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-SECURITY

So here, the manual explicity states that functions are parsed at
execution, not creation time. As seen in my original example in this
thread, this isn't (fully) true. Moreover, it isn't true for all
SQL-language functions, as can be demonstrated with the following code:

============

CREATE SCHEMA s1;
CREATE SCHEMA s2;

CREATE VIEW s1.v AS SELECT 'creation' AS col;
CREATE VIEW s2.v AS SELECT 'runtime' AS col;

SET search_path TO 'public', 's1';

CREATE FUNCTION use_sql_atomic() RETURNS TEXT LANGUAGE sql BEGIN ATOMIC
  SELECT 'use_sql_atomic = ' || col FROM v;
END;

CREATE FUNCTION use_sql_string() RETURNS TEXT LANGUAGE sql AS $$
  SELECT 'use_sql_string = ' || col FROM v;
$$;

CREATE FUNCTION use_plpgsql() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN
  RETURN (SELECT 'use_plpgsql = ' || col FROM v);
END; $$;

SET search_path TO 'public', 's2';

SELECT use_sql_atomic() AS "output" UNION ALL
SELECT use_sql_string() AS "output" UNION ALL
SELECT use_plpgsql() AS "output";

============

This generates the following output:

          output           
---------------------------
 use_sql_atomic = creation
 use_sql_string = runtime
 use_plpgsql = runtime
(3 rows)

Overall, PostgreSQL doesn't behave consistent, and to me it seems that
the documentation isn't describing its behavior correctly either.

I understand if fixing this is too much work (even though I would
really like to see this fixed). But given that the current behavior is
highly surprising and inconsistent - and keeping in mind that this is a
subject that may affect security - I think the documentation should
reflect the current behavior at least. I thus see this as a
documentation issue.

Kind regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Adrian Klaver
Date:
On 1/1/25 09:55, Jan Behrens wrote:
> On Sat, 28 Dec 2024 00:40:09 +0100
> Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> 
>> On Fri, 27 Dec 2024 13:26:28 -0700
>> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>>
>>>> Or is it documented somewhere?
>>>
>>> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
>>
>> I can't find any notes regarding functions and schemas in that section.
> 
> Actually, I found another note in the documentation. But it doesn't
> explain things correctly. In the documentation for PostgreSQL 17,
> section 36.17.6.1. (Security Considerations for Extension Functions)
> says:
> 
> "SQL-language and PL-language functions provided by extensions are at
> risk of search-path-based attacks when they are executed, since parsing
> of these functions occurs at execution time not creation time."
> 
> https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-SECURITY
> 
> So here, the manual explicity states that functions are parsed at
> execution, not creation time. As seen in my original example in this
> thread, this isn't (fully) true. Moreover, it isn't true for all
> SQL-language functions, as can be demonstrated with the following code:
> 
> ============
> 
> CREATE SCHEMA s1;
> CREATE SCHEMA s2;
> 
> CREATE VIEW s1.v AS SELECT 'creation' AS col;
> CREATE VIEW s2.v AS SELECT 'runtime' AS col;
> 
> SET search_path TO 'public', 's1';
> 
> CREATE FUNCTION use_sql_atomic() RETURNS TEXT LANGUAGE sql BEGIN ATOMIC
>    SELECT 'use_sql_atomic = ' || col FROM v;
> END;
> 
> CREATE FUNCTION use_sql_string() RETURNS TEXT LANGUAGE sql AS $$
>    SELECT 'use_sql_string = ' || col FROM v;
> $$;
> 
> CREATE FUNCTION use_plpgsql() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN
>    RETURN (SELECT 'use_plpgsql = ' || col FROM v);
> END; $$;
> 
> SET search_path TO 'public', 's2';
> 
> SELECT use_sql_atomic() AS "output" UNION ALL
> SELECT use_sql_string() AS "output" UNION ALL
> SELECT use_plpgsql() AS "output";
> 
> ============
> 
> This generates the following output:
> 
>            output
> ---------------------------
>   use_sql_atomic = creation
>   use_sql_string = runtime
>   use_plpgsql = runtime
> (3 rows)
> 
> Overall, PostgreSQL doesn't behave consistent, and to me it seems that
> the documentation isn't describing its behavior correctly either.

https://www.postgresql.org/docs/current/sql-createfunction.html

"sql_body

     The body of a LANGUAGE SQL function. This can either be a single 
statement

     RETURN expression

     or a block

     BEGIN ATOMIC
       statement;
       statement;
       ...
       statement;
     END

     This is similar to writing the text of the function body as a 
string constant (see definition above), but there are some differences: 
This form only works for LANGUAGE SQL, the string constant form works 
for all languages. This form is parsed at function definition time, the 
string constant form is parsed at execution time; therefore this form 
cannot support polymorphic argument types and other constructs that are 
not resolvable at function definition time. This form tracks 
dependencies between the function and objects used in the function body, 
so DROP ... CASCADE will work correctly, whereas the form using string 
literals may leave dangling functions. Finally, this form is more 
compatible with the SQL standard and other SQL implementations.
"

> 
> I understand if fixing this is too much work (even though I would
> really like to see this fixed). But given that the current behavior is
> highly surprising and inconsistent - and keeping in mind that this is a
> subject that may affect security - I think the documentation should
> reflect the current behavior at least. I thus see this as a
> documentation issue.
> 
> Kind regards,
> Jan Behrens
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens <jbe-mlist@magnetkern.de> wrote:

> On Fri, 27 Dec 2024 13:26:28 -0700
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> > > Or is it documented somewhere?
> >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
>
> I can't find any notes regarding functions and schemas in that section.


"Because PL/pgSQL saves prepared statements and sometimes execution plans in this way, SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command."

Changing search_path is just one possible way to change out which object a name tries to refer to so it is not called out explicitly.


"SQL-language and PL-language functions provided by extensions are at
risk of search-path-based attacks when they are executed, since parsing
of these functions occurs at execution time not creation time."
 
Moreover, it isn't true for all
SQL-language functions, as can be demonstrated with the following code:

Yeah, when we added a second method to write an SQL-language function, one that doesn't simply accept a string body, we didn't update that section to point out that is the string input variant of create function that is affected in this manner, the non-string (atomic) variant stores the result of parsing the inline code as opposed to storing the raw text.

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Wed, 1 Jan 2025 11:19:32 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> 
> > On Sat, 28 Dec 2024 00:40:09 +0100
> > Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> > >
> > > > > Or is it documented somewhere?
> > > >
> > > >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > >
> > > I can't find any notes regarding functions and schemas in that section.
> >
> >
> "Because PL/pgSQL saves prepared statements and sometimes execution plans
> in this way, SQL commands that appear directly in a PL/pgSQL function must
> refer to the same tables and columns on every execution; that is, you
> cannot use a parameter as the name of a table or column in an SQL command."
> 
> Changing search_path is just one possible way to change out which object a
> name tries to refer to so it is not called out explicitly.

The first part of the cited sentence seems helpful ("you must always
refer to the same tables and columns on every execution"). I would thus
conclude that using a dynamic search_path when running functions or
procedures is *always* considered errorneous (even though not reported
by the database as an error), except when using EXECUTE.

I wonder if the database could/should generate an error (or at least a
warning?) when a function or procedure without a "SET search_path"
statement uses a non-qualified name? According to the documentation
using a dynamic search_path to refer to different entities in the
database is a case that "must" not happen.

But following through, this might lead to more warnings one might
expect, e.g. when using simple operators such as "=" or the "IN" or
"CASE expression WHEN" statements, as these rely on the search_path as
well. Should such code be considered non-idiomatic, dangerous, or even
errorneous if a "SET search_path" option is missing in the
function's/procedure's definition?

Maybe I'm overthinking this. But in practice, I've been running into
surprising issues whenever functions and schemas are involved, and I'm
not sure if every programmer will be aware of how important it is to
properly set a search_path in the function's defintion after reading
the documentation. (Besides, it's not always possible in procedures.)

> 
> > "SQL-language and PL-language functions provided by extensions are at
> > risk of search-path-based attacks when they are executed, since parsing
> > of these functions occurs at execution time not creation time."
> 
> > Moreover, it isn't true for all
> > SQL-language functions, as can be demonstrated with the following code:
> 
> Yeah, when we added a second method to write an SQL-language function, one
> that doesn't simply accept a string body, we didn't update that section to
> point out that is the string input variant of create function that is
> affected in this manner, the non-string (atomic) variant stores the result
> of parsing the inline code as opposed to storing the raw text.
> 
> David J.

I missed that other part in the manual (which is in a totally different
section). Should I report the missing update in section 36.17.6.1. of
the documentation as a documentation issue, or is it not necessary?

Kind regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:
Hi

čt 2. 1. 2025 v 11:37 odesílatel Jan Behrens <jbe-mlist@magnetkern.de> napsal:
On Wed, 1 Jan 2025 11:19:32 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
>
> > On Sat, 28 Dec 2024 00:40:09 +0100
> > Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> > >
> > > > > Or is it documented somewhere?
> > > >
> > > >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > >
> > > I can't find any notes regarding functions and schemas in that section.
> >
> >
> "Because PL/pgSQL saves prepared statements and sometimes execution plans
> in this way, SQL commands that appear directly in a PL/pgSQL function must
> refer to the same tables and columns on every execution; that is, you
> cannot use a parameter as the name of a table or column in an SQL command."
>
> Changing search_path is just one possible way to change out which object a
> name tries to refer to so it is not called out explicitly.

The first part of the cited sentence seems helpful ("you must always
refer to the same tables and columns on every execution"). I would thus
conclude that using a dynamic search_path when running functions or
procedures is *always* considered errorneous (even though not reported
by the database as an error), except when using EXECUTE.

I wonder if the database could/should generate an error (or at least a
warning?) when a function or procedure without a "SET search_path"
statement uses a non-qualified name? According to the documentation
using a dynamic search_path to refer to different entities in the
database is a case that "must" not happen.

But following through, this might lead to more warnings one might
expect, e.g. when using simple operators such as "=" or the "IN" or
"CASE expression WHEN" statements, as these rely on the search_path as
well. Should such code be considered non-idiomatic, dangerous, or even
errorneous if a "SET search_path" option is missing in the
function's/procedure's definition?

Maybe I'm overthinking this. But in practice, I've been running into
surprising issues whenever functions and schemas are involved, and I'm
not sure if every programmer will be aware of how important it is to
properly set a search_path in the function's defintion after reading
the documentation. (Besides, it's not always possible in procedures.)

How can you identify unwanted usage of non qualified identifiers from wanted usage of non qualified identifiers? It is a common pattern for sharding. Using not qualified identifiers of operators, functions is common when you are using orafce extensions, etc.

Using qualified identifiers everywhere strongly reduces readability. There are no aliases to the schema, so aliases cannot help.

you can identify the functions where search_path is not explicitly assigned

select oid::regprocedure 
  from pg_proc 
where pronamespace::regnamespace not in ('pg_catalog', 'information_schema') 
   and not exists(select 1 from unnest(proconfig) g(v) where  v ~ '^search_path');


Regards

Pavel


>
> > "SQL-language and PL-language functions provided by extensions are at
> > risk of search-path-based attacks when they are executed, since parsing
> > of these functions occurs at execution time not creation time."
>
> > Moreover, it isn't true for all
> > SQL-language functions, as can be demonstrated with the following code:
>
> Yeah, when we added a second method to write an SQL-language function, one
> that doesn't simply accept a string body, we didn't update that section to
> point out that is the string input variant of create function that is
> affected in this manner, the non-string (atomic) variant stores the result
> of parsing the inline code as opposed to storing the raw text.
>
> David J.

I missed that other part in the manual (which is in a totally different
section). Should I report the missing update in section 36.17.6.1. of
the documentation as a documentation issue, or is it not necessary?

Kind regards,
Jan Behrens


Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Thu, 2 Jan 2025 12:40:59 +0100
Pavel Stehule <pavel.stehule@gmail.com> wrote:

> How can you identify unwanted usage of non qualified identifiers from
> wanted usage of non qualified identifiers? It is a common pattern for
> sharding. Using not qualified identifiers of operators, functions is common
> when you are using orafce extensions, etc.

I don't fully understand the use-case. Could you elaborate?

As I understand, even if identifiers are not fully-qualified, it is
forbidden to use the search_path to refer to different database
entities at run-time (as David pointed out).

So I don't understand how a dynamic "search_path" could be used in any
scenario within functions except when EXECUTE is involved.

> 
> Using qualified identifiers everywhere strongly reduces readability. There
> are no aliases to the schema, so aliases cannot help.

Yes, I agree on that. Using "SET search_path" in the function's
definition fixes that problem, but it's easy to miss how important this
is from reading the documentation:

The manual regarding "CREATE FUNCTION" refers to "search_path" only
within the "Writing SECURITY DEFINER Functions Safely" section. It's
easy to skip that part unless you use that feature. Moreover, that
section alone doesn't explain the weird behavior of four different
outcomes of a function with only two schemas involved which I brought
up in the beginning of this thread.

The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
documentation doesn't mention the search_path or schemas. And I don't
think you can expect every programmer will read the "Plan Caching"
subsection in the "PL/pgSQL under the Hood" section. But even then, the
information is just provided indirectly.

Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
give any hint either.

I think (assuming that the behavior isn't fixed) that some slighly more
prominent warning would be reasonable.

> 
> you can identify the functions where search_path is not explicitly assigned
> 
> select oid::regprocedure
>   from pg_proc
> where pronamespace::regnamespace not in ('pg_catalog',
> 'information_schema')
>    and not exists(select 1 from unnest(proconfig) g(v) where  v ~
> '^search_path');
> 
> 
> Regards
> 
> Pavel

Kind regards,
Jan



Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:


čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens <jbe-mlist@magnetkern.de> napsal:
On Thu, 2 Jan 2025 12:40:59 +0100
Pavel Stehule <pavel.stehule@gmail.com> wrote:

> How can you identify unwanted usage of non qualified identifiers from
> wanted usage of non qualified identifiers? It is a common pattern for
> sharding. Using not qualified identifiers of operators, functions is common
> when you are using orafce extensions, etc.

I don't fully understand the use-case. Could you elaborate?

As I understand, even if identifiers are not fully-qualified, it is
forbidden to use the search_path to refer to different database
entities at run-time (as David pointed out).

So I don't understand how a dynamic "search_path" could be used in any
scenario within functions except when EXECUTE is involved.

you don't need more databases

schema one - customer x
schema two - customer y

create table one.t1(..); create table one.t2(..);
create table two.t1(..); create table two.t2(..);

set search_path to one;
-- work with data set of customer x

set search_path to two;
-- work wit data set of customer y

some times can be pretty ineffective to have database per customer - more connect, disconnect in postgres is much more expensive than SET search_path TO .. and maybe RESET plans;


 

>
> Using qualified identifiers everywhere strongly reduces readability. There
> are no aliases to the schema, so aliases cannot help.

Yes, I agree on that. Using "SET search_path" in the function's
definition fixes that problem, but it's easy to miss how important this
is from reading the documentation:

The manual regarding "CREATE FUNCTION" refers to "search_path" only
within the "Writing SECURITY DEFINER Functions Safely" section. It's
easy to skip that part unless you use that feature. Moreover, that
section alone doesn't explain the weird behavior of four different
outcomes of a function with only two schemas involved which I brought
up in the beginning of this thread.

The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
documentation doesn't mention the search_path or schemas. And I don't
think you can expect every programmer will read the "Plan Caching"
subsection in the "PL/pgSQL under the Hood" section. But even then, the
information is just provided indirectly.


yes, probably nobody reads the plan caching doc. And if they read it, then because they have performance problems.

 
Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
give any hint either.

This is a question - this is a generic feature in Postgres.  Every query can be impacted by setting of search_path.

From my perspective, there can be a note in the documentation related to copy types and row types.


The problem that you found is not just about the change of search_path. Same problem can be found after altering the table.

Regards

Pavel


I think (assuming that the behavior isn't fixed) that some slighly more
prominent warning would be reasonable.

>
> you can identify the functions where search_path is not explicitly assigned
>
> select oid::regprocedure
>   from pg_proc
> where pronamespace::regnamespace not in ('pg_catalog',
> 'information_schema')
>    and not exists(select 1 from unnest(proconfig) g(v) where  v ~
> '^search_path');
>
>
> Regards
>
> Pavel

Kind regards,
Jan

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Thu, 2 Jan 2025 13:48:29 +0100
Pavel Stehule <pavel.stehule@gmail.com> wrote:

> čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens <jbe-mlist@magnetkern.de>
> napsal:
> 
> > On Thu, 2 Jan 2025 12:40:59 +0100
> > Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >
> > > How can you identify unwanted usage of non qualified identifiers from
> > > wanted usage of non qualified identifiers? It is a common pattern for
> > > sharding. Using not qualified identifiers of operators, functions is
> > common
> > > when you are using orafce extensions, etc.
> >
> > I don't fully understand the use-case. Could you elaborate?
> >
> > As I understand, even if identifiers are not fully-qualified, it is
> > forbidden to use the search_path to refer to different database
> > entities at run-time (as David pointed out).
> >
> > So I don't understand how a dynamic "search_path" could be used in any
> > scenario within functions except when EXECUTE is involved.
> >
> 
> you don't need more databases
> 
> schema one - customer x
> schema two - customer y
> 
> create table one.t1(..); create table one.t2(..);
> create table two.t1(..); create table two.t2(..);
> 
> set search_path to one;
> -- work with data set of customer x
> 
> set search_path to two;
> -- work wit data set of customer y
> 
> some times can be pretty ineffective to have database per customer - more
> connect, disconnect in postgres is much more expensive than SET search_path
> TO .. and maybe RESET plans;

I guess that means there is a practical application where search_path
MAY change at runtime IF done in different sessions or if the cache is
reset using the DISCARD command:

https://www.postgresql.org/docs/17/sql-discard.html

I assume DISCARD PLANS would be the right command?

This seems to be a very special case though. I think there should be a
warning in the documentation of CREATE FUNCTION with regard to schemas
anyway, though.

Regards,
Jan



Re: search_path for PL/pgSQL functions partially cached?

From
Pavel Stehule
Date:
Hi
 
>
> some times can be pretty ineffective to have database per customer - more
> connect, disconnect in postgres is much more expensive than SET search_path
> TO .. and maybe RESET plans;

I guess that means there is a practical application where search_path
MAY change at runtime IF done in different sessions or if the cache is
reset using the DISCARD command:

https://www.postgresql.org/docs/17/sql-discard.html

I assume DISCARD PLANS would be the right command?

that depends. plan inside plan cache is invalidated when search_path is different. You use RESET plans because you want to release all plans quickly.

Unfortunately, the types assigned to plpgsql variables are not invalidated. This is the source of problems. It is a classical problem - it is hard to say when you should invalidate cache.
Current design is not ideal - but it is almost a good enough compromise between correctness and performance. It is true, so nobody did some work to fix it. So maybe the impact to performance should not be too bad, but it is not an easy issue. plans are isolated - and the impact of one plan to the second plan is zero. For variables it is exactly opposite.
 

This seems to be a very special case though. I think there should be a
warning in the documentation of CREATE FUNCTION with regard to schemas
anyway, though.

I am not sure. If you want to use this warning, then it should be everywhere where any non-qualified identifier can be used. Maybe in plpgsql can be more accented so almost everything in plpgsql depends on the current setting of search_path. Lot of people don't understand, so every expression in plpgsql is SQL and every expression is executed like part of a query. And unfortunately there are some different caches - plpgsql cache and plan cache and both caches are invalidated at different times (I think so plpgsql cache is not resetted by RESET PLANS). Maybe it is better to explain how plpgsql works. It is a little bit different from well known interpreted languages.


Regards,
Jan

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens <jbe-mlist@magnetkern.de> wrote:

> > Add qualification or attach a “set search_path” clause to “create
> > function”.  Code stored in the server should not rely on the session
> > search_path.
> > 
> > David J.

I have been trying to adjust some of my code, and I still have cases
where I have to rely on the session's search_path. I'll provide an
example below.

> 
> [...]
> 
> My question is: Am I safe if I use fully-qualified types in the DECLARE
> section only? Or do I need to provide full qualification also in the
> code below (after SET search_path TO 'myschema')?
> 
> And bonus question: Is it documented somewhere?
>
> [...]
> 
> Kind Regards
> Jan Behrens

The following code is taken from a project I'm currently working on:

============

-- Let's assume we don't know the name of the schema in which the
-- "pgratio" extension with the RATIONAL data type is installed.
CREATE SCHEMA "qwertyuiop";
CREATE EXTENSION "pgratio" WITH SCHEMA "qwertyuiop";

-- This installs schema "myschema" with some dynamic function:
BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO "myschema";

-- Append schema of "pgratio" extension, which provides the RATIONAL
-- data type, to search_path:
SELECT set_config(
  'search_path',
  current_setting('search_path') || ', ' || quote_ident(nspname),
  TRUE
) FROM pg_namespace, pg_extension
  WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio';

CREATE DOMAIN "rational_wrapper" AS RATIONAL;

CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS RATIONAL

  --------------------------------------------------------------------
  -- I cannot use SET search_path FROM CURRENT here, because "query_p"
  -- shall refer to tables in the search_path of the caller.
  --------------------------------------------------------------------

  LANGUAGE plpgsql AS $$
    DECLARE
      "old_search_path" TEXT;

      ----------------------------------------------------------------
      -- I have to fully qualify the following type.
      -- Moreover, I can't use RATIONAL as I don't know its schema.
      ----------------------------------------------------------------

      "result" "myschema"."rational_wrapper";
    BEGIN
      SELECT current_setting('search_path') INTO "old_search_path";
      PERFORM set_config(
        'search_path',
        'myschema, ' || quote_ident(nspname) || ', pg_temp, ' ||
        "old_search_path",
        TRUE
      ) FROM pg_namespace, pg_extension
        WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio';

      ----------------------------------------------------------------
      -- Is it safe to not fully qualify type RATIONAL below?
      -- And, if yes, where in the documentation is this explained?
      ----------------------------------------------------------------

      CREATE TEMP TABLE "mytemptable" ("val" RATIONAL);
      EXECUTE 'INSERT INTO "mytemptable" '
        'SELECT "query"."a" * "query"."b" '
        'FROM (' || "query_p" || ') AS "query"';
      -- Do some things here.
      SELECT sum("val") INTO "result" FROM "mytemptable";
      PERFORM set_config('search_path', "old_search_path", TRUE);
      RETURN "result";
    END;
  $$;

COMMIT;

CREATE TABLE "tbl" ("foo" INT8, "bar" INT8);
INSERT INTO "tbl" VALUES (5, 7), (1, 10);

SELECT "myschema"."some_function"(
  'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"'
);
\c
SELECT "myschema"."some_function"(
  'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"'
);

============

The code for the pgratio extension that provides the RATIONAL data type
is found here: https://www.public-software-group.org/pgratio

Running that code on my machine correctly gives:

some_function 
---------------
 45
(1 row)

You are now connected to database "jbe" as user "jbe".
 some_function 
---------------
 45
(1 row)

Because extensions can only be installed in one schema, it may be a bad
idea to have a component requiring an extension to be installed in a
particular schema (because if different components have different
expectations on the schema name, e.g. some might expect "pgratio" to be
installed in "public" and others might expect it in "pgratio" or some
other schema such as "qwertyuiop", this would lead to an unresolvable
conflict).

I would like to know if the above example is correct. It seems overall
bulky, but I haven't found a better way, assuming that it can be
unknown where a particular extension has been installed to. In
particular I feel a bit insecure about where I have to fully qualify,
and where not. See the comments in the code above.

Note that I want the function to accept a query that makes sense in the
caller's search_path. Thus using "SET search_path FROM CURRENT" is not
an option for me, I believe.

Regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
I would like to know if the above example is correct. It seems overall
bulky, but I haven't found a better way, assuming that it can be
unknown where a particular extension has been installed to. In
particular I feel a bit insecure about where I have to fully qualify,
and where not. See the comments in the code above.

Short answer, you cannot looking at a definition and know the answer - whether the code is going to be executed in a sanitized search_path is what matters.  Anything that would be executed during pg_restore has to be made safe.  Therefore, code that is only ever executed by applications directly can use swarch_path.

I’d probably modify the function signature to take search_path as a second optional argument and then invoke a set search_path within the function.  At worse the caller can place current_setting(search_path) as the value of that argument though being explicit would be recommended.

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Fri, 3 Jan 2025 08:34:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > I would like to know if the above example is correct. It seems overall
> > bulky, but I haven't found a better way, assuming that it can be
> > unknown where a particular extension has been installed to. In
> > particular I feel a bit insecure about where I have to fully qualify,
> > and where not. See the comments in the code above.
> 
> 
> Short answer, you cannot looking at a definition and know the answer -
> whether the code is going to be executed in a sanitized search_path is what
> matters.

I don't understand. Do you mean my last example is wrong / insecure?
If so, why?

> Anything that would be executed during pg_restore has to be made
> safe.  Therefore, code that is only ever executed by applications directly
> can use swarch_path.

Why should the function be executed during pg_restore?

> 
> I’d probably modify the function signature to take search_path as a second
> optional argument and then invoke a set search_path within the function.
> At worse the caller can place current_setting(search_path) as the value of
> that argument though being explicit would be recommended.
> 
> David J.

I could do that, but I would like to understand if that is really
necessary as it makes the interface more complicated, and I would like
to avoid unnecessary complexity in my interface.

Is it really impossible to have functions without SET search_path in
the definition of a PL/pgSQL function if I fully-qualify all types in
the DECLARE section and if all other non-qualified identifiers occur
after set_config('search_path', ...)?

Kind regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Fri, Jan 3, 2025 at 9:48 AM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
On Fri, 3 Jan 2025 08:34:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Friday, January 3, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> >
> > I would like to know if the above example is correct. It seems overall
> > bulky, but I haven't found a better way, assuming that it can be
> > unknown where a particular extension has been installed to. In
> > particular I feel a bit insecure about where I have to fully qualify,
> > and where not. See the comments in the code above.
>
>
> Short answer, you cannot looking at a definition and know the answer -
> whether the code is going to be executed in a sanitized search_path is what
> matters.

I don't understand. Do you mean my last example is wrong / insecure?
If so, why?

It is at risk because it depends on the session search_path.  That is all.  Whether that risk turns into a failure to execute depends on how/when it is executed.  I'm not that comfortable talking about security risks in this context though the current design goal is to mitigate such security issues by setting things up so the function execution fails rather than is executed insecurely.  This is presently mainly done by setting the search_path to just effectively pg_catalog before executing the query, breaking any code depending on other schemas existing in the search_path.


> Anything that would be executed during pg_restore has to be made
> safe.  Therefore, code that is only ever executed by applications directly
> can use swarch_path.

Why should the function be executed during pg_restore?

If the function is used in building an index, or a materialized view, are the common cases.  Trigger functions too.

Note, this is talking about evaluating functions generally, not the one provided here specifically.

 

I could do that, but I would like to understand if that is really
necessary as it makes the interface more complicated, and I would like
to avoid unnecessary complexity in my interface.

Is it really impossible to have functions without SET search_path in
the definition of a PL/pgSQL function if I fully-qualify all types in
the DECLARE section and if all other non-qualified identifiers occur
after set_config('search_path', ...)?

If you add a set_config to the body of the function then you indeed avoid the problem.  It is basically equivalent to adding a SET clause to the create function command.  In this case even when the function is executed in a sanitized search_path environment (such as the one established by pg_restore) you are not relying on it.  That non-reliance is all that really matters.

David J.
 

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Fri, 3 Jan 2025 10:16:15 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> It is at risk because it depends on the session search_path.  That is all.
> Whether that risk turns into a failure to execute depends on how/when it is
> executed.  I'm not that comfortable talking about security risks in this
> context though the current design goal is to mitigate such security issues
> by setting things up so the function execution fails rather than is
> executed insecurely.  This is presently mainly done by setting the
> search_path to just effectively pg_catalog before executing the query,
> breaking any code depending on other schemas existing in the search_path.

I'm not sure if there is a misunderstanding. In my last example (e-mail
dated Fri, 3 Jan 2025 13:53:32 +0100), the user who has control over
the contents of the "query_p" argument is an application programmer,
not a real end-user. The function is also *not* marked as SECURITY
DEFINER, so it always runs with the privileges of the caller. I don't
see any specific security risk here, except that I'm unsure if the
function is written properly with regard to qualification of the used
types after PL/pgSQL's BEGIN. As I learned, I must fully-qualify types
*before* the BEGIN, i.e. in the DECLARE section. But does this also
hold for types after the BEGIN when I previously ensure that the
search_path is correctly set (set within the function's body)?

> 
> > > Anything that would be executed during pg_restore has to be made
> > > safe.  Therefore, code that is only ever executed by applications
> > directly
> > > can use swarch_path.
> >
> > Why should the function be executed during pg_restore?
> 
> 
> If the function is used in building an index, or a materialized view, are
> the common cases.  Trigger functions too.
> 
> Note, this is talking about evaluating functions generally, not the one
> provided here specifically.

I don't think my function would be evaluated during a pg_restore then.

> 
> > I could do that, but I would like to understand if that is really
> > necessary as it makes the interface more complicated, and I would like
> > to avoid unnecessary complexity in my interface.
> >
> > Is it really impossible to have functions without SET search_path in
> > the definition of a PL/pgSQL function if I fully-qualify all types in
> > the DECLARE section and if all other non-qualified identifiers occur
> > after set_config('search_path', ...)?
> >
> If you add a set_config to the body of the function then you indeed avoid
> the problem.  It is basically equivalent to adding a SET clause to the
> create function command.  In this case even when the function is executed
> in a sanitized search_path environment (such as the one established by
> pg_restore) you are not relying on it.  That non-reliance is all that
> really matters.
> 
> David J.

But if I use "SET search_path FROM CURRENT", then the called function
won't know the search_path that is set at the caller's side (which is
what I need to make my interface nice to use).

I would prefer my current solution, but I would like to be sure that my
example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is
correct. I still am not sure about that.

Kind Regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Adrian Klaver
Date:
On 1/3/25 13:33, Jan Behrens wrote:
> On Fri, 3 Jan 2025 10:16:15 -0700
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> 

> 
> But if I use "SET search_path FROM CURRENT", then the called function
> won't know the search_path that is set at the caller's side (which is
> what I need to make my interface nice to use).

At this point I am lost as to what the overall goal of this is.

Can you provide a 10000 ft view if what it is you are trying to achieve?


> 
> I would prefer my current solution, but I would like to be sure that my
> example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is
> correct. I still am not sure about that.
> 
> Kind Regards,
> Jan Behrens
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Fri, Jan 3, 2025 at 2:33 PM Jan Behrens <jbe-mlist@magnetkern.de> wrote:
I would prefer my current solution, but I would like to be sure that my
example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is
correct. I still am not sure about that.


If it does what you want and it is only ever executed by application code over a client connection you should be fine.  Your client connection will always have whatever search_path you arrange to have in place and the application developer will know whether their environment is correct or not when they test it, and have the ability to change their environment as needed.

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Fri, 3 Jan 2025 13:56:02 -0800
Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> At this point I am lost as to what the overall goal of this is.
> 
> Can you provide a 10000 ft view if what it is you are trying to achieve?

Sure! I would like to create a component (e.g. a PostgreSQL extension)
that provides a function which processes some complex data, without
making any requirements regarding where the data is stored. To pass
this data to the function, I could use arrays of composite types, but
that seems to be very bulky. Another option would be to use cursors,
but that didn't turn out to work very smooth either.

Instead, I plan to expect the function to receive a query string that
will get the data that is being processed by the function.

That query string should be allowed to refer to tables in the
search_path at the caller's side.

Therefore, I cannot use the "SET search_path FROM CURRENT" in my
"CREATE FUNCTION" statement, because it would overwrite the current
search_path on each call of the function.

Thus my idea is to do this (simplified):

CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"

  --------------------------------------------------------------------
  -- I cannot use SET search_path FROM CURRENT here, because "query_p"
  -- shall refer to tables in the search_path of the caller.
  --------------------------------------------------------------------

  LANGUAGE plpgsql AS $$
    DECLARE
      "old_search_path" TEXT;

      ----------------------------------------------------------------
      -- I have to fully qualify types in the DECLARE section.
      ----------------------------------------------------------------

      "some_variable" "some_schema"."some_type";
    BEGIN
      SELECT current_setting('search_path') INTO "old_search_path";
      PERFORM set_config(
        'search_path',
        'some_schema, pg_temp, ' || "old_search_path",
        TRUE
      );

      ----------------------------------------------------------------
      -- Do I have to fully qualify types and operators from
      -- "myschema" here? Or is it safe to not fully qualify them?
      ----------------------------------------------------------------
    END;
  $$;

That is my overall idea.

My problem is that I'm confused about WHEN EXACTLY I have to qualify
tables/types, etc. It is very hard to understand from reading (just) the
documentation.

Kind Regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Sat, 4 Jan 2025 00:22:03 +0100
Jan Behrens <jbe-mlist@magnetkern.de> wrote:

>       ----------------------------------------------------------------
>       -- Do I have to fully qualify types and operators from
>       -- "myschema" here? Or is it safe to not fully qualify them?
>       ----------------------------------------------------------------

This was meant to read:

...from "some_schema" here.



Re: search_path for PL/pgSQL functions partially cached?

From
Isaac Morland
Date:
On Fri, 3 Jan 2025 at 18:22, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
 
Instead, I plan to expect the function to receive a query string that
will get the data that is being processed by the function.

That query string should be allowed to refer to tables in the
search_path at the caller's side.

Therefore, I cannot use the "SET search_path FROM CURRENT" in my
"CREATE FUNCTION" statement, because it would overwrite the current
search_path on each call of the function.

 I wonder if it would help if EXECUTE took an optional search_path to use while executing the query.

Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Fri, 3 Jan 2025 18:36:13 -0500
Isaac Morland <isaac.morland@gmail.com> wrote:

> On Fri, 3 Jan 2025 at 18:22, Jan Behrens <jbe-mlist@magnetkern.de> wrote:
> 
> 
> > Instead, I plan to expect the function to receive a query string that
> > will get the data that is being processed by the function.
> >
> > That query string should be allowed to refer to tables in the
> > search_path at the caller's side.
> >
> > Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> > "CREATE FUNCTION" statement, because it would overwrite the current
> > search_path on each call of the function.
> >
> 
>  I wonder if it would help if EXECUTE took an optional search_path to use
> while executing the query.

That wouldn't solve my problem, because the function that includes the
EXECUTE still needs to know the search_path set on the caller side.

This only works if I omit the "SET search_path FROM CURRENT" option in
the function's definition OR if I pass a search_path as an argument. I
guess I could write a wrapper:

============

BEGIN;

CREATE SCHEMA "some_schema";
SET LOCAL search_path TO "some_schema";

CREATE TYPE "some_type" AS ("name" TEXT, "height" INT8);

CREATE FUNCTION "foo_impl"("query_p" TEXT, "search_path_p" TEXT)
  RETURNS "some_type"
  LANGUAGE plpgsql SET search_path FROM CURRENT AS $$
    DECLARE
      "old_search_path" TEXT;
      "result" "some_type";
    BEGIN
      "old_search_path" = current_setting('search_path');
      PERFORM set_config('search_path', "search_path_p", TRUE);
      EXECUTE "query_p" INTO "result";
      PERFORM set_config('search_path', "old_search_path", TRUE);
      RETURN "result";
    END;
  $$;

CREATE FUNCTION "foo"("query_p" TEXT)
  RETURNS "some_type"
  RETURN "foo_impl"("query_p", current_setting('search_path'));

COMMIT;

CREATE TABLE "tbl" ("id" SERIAL8, "name" TEXT, "height" INT8);
INSERT INTO "tbl" ("name", "height") VALUES ('Alice', 200);

SELECT * FROM "some_schema"."foo"('SELECT "name" FROM "tbl"');

============

Not sure which variant (this or my previous attempt) is better and if
either is safe/correct.

Regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
Adrian Klaver
Date:
On 1/3/25 15:22, Jan Behrens wrote:
> On Fri, 3 Jan 2025 13:56:02 -0800
> Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
>> At this point I am lost as to what the overall goal of this is.
>>
>> Can you provide a 10000 ft view if what it is you are trying to achieve?
> 
> Sure! I would like to create a component (e.g. a PostgreSQL extension)
> that provides a function which processes some complex data, without
> making any requirements regarding where the data is stored. To pass
> this data to the function, I could use arrays of composite types, but
> that seems to be very bulky. Another option would be to use cursors,
> but that didn't turn out to work very smooth either.
> 
> Instead, I plan to expect the function to receive a query string that
> will get the data that is being processed by the function.
> 
> That query string should be allowed to refer to tables in the
> search_path at the caller's side.
> 
> Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> "CREATE FUNCTION" statement, because it would overwrite the current
> search_path on each call of the function.
> 
> Thus my idea is to do this (simplified):
> 
> CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"
> 
>    --------------------------------------------------------------------
>    -- I cannot use SET search_path FROM CURRENT here, because "query_p"
>    -- shall refer to tables in the search_path of the caller.
>    --------------------------------------------------------------------
> 
>    LANGUAGE plpgsql AS $$
>      DECLARE
>        "old_search_path" TEXT;
> 
>        ----------------------------------------------------------------
>        -- I have to fully qualify types in the DECLARE section.
>        ----------------------------------------------------------------
> 
>        "some_variable" "some_schema"."some_type";
>      BEGIN
>        SELECT current_setting('search_path') INTO "old_search_path";
>        PERFORM set_config(
>          'search_path',
>          'some_schema, pg_temp, ' || "old_search_path",
>          TRUE
>        );
> 
>        ----------------------------------------------------------------
>        -- Do I have to fully qualify types and operators from
>        -- "myschema" here? Or is it safe to not fully qualify them?
>        ----------------------------------------------------------------
>      END;
>    $$;
> 
> That is my overall idea.

Is 'some_schema' a known item when installing?

Once you have the search_path defined and assuming all the objects you 
want are in that path, then yes you can drop the schema qualification.

> 
> My problem is that I'm confused about WHEN EXACTLY I have to qualify
> tables/types, etc. It is very hard to understand from reading (just) the
> documentation.

If you are doing this as an extension then I suspect you want the 
processes shown here:

https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-RELOCATION


> 
> Kind Regards,
> Jan Behrens

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Sat, 4 Jan 2025 09:37:14 -0800
Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 1/3/25 15:22, Jan Behrens wrote:
> > On Fri, 3 Jan 2025 13:56:02 -0800
> > Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> > 
> >> At this point I am lost as to what the overall goal of this is.
> >>
> >> Can you provide a 10000 ft view if what it is you are trying to achieve?
> > 
> > Sure! I would like to create a component (e.g. a PostgreSQL extension)
> > that provides a function which processes some complex data, without
> > making any requirements regarding where the data is stored. To pass
> > this data to the function, I could use arrays of composite types, but
> > that seems to be very bulky. Another option would be to use cursors,
> > but that didn't turn out to work very smooth either.
> > 
> > Instead, I plan to expect the function to receive a query string that
> > will get the data that is being processed by the function.
> > 
> > That query string should be allowed to refer to tables in the
> > search_path at the caller's side.
> > 
> > Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> > "CREATE FUNCTION" statement, because it would overwrite the current
> > search_path on each call of the function.
> > 
> > Thus my idea is to do this (simplified):
> > 
> > CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"
> > 
> >    --------------------------------------------------------------------
> >    -- I cannot use SET search_path FROM CURRENT here, because "query_p"
> >    -- shall refer to tables in the search_path of the caller.
> >    --------------------------------------------------------------------
> > 
> >    LANGUAGE plpgsql AS $$
> >      DECLARE
> >        "old_search_path" TEXT;
> > 
> >        ----------------------------------------------------------------
> >        -- I have to fully qualify types in the DECLARE section.
> >        ----------------------------------------------------------------
> > 
> >        "some_variable" "some_schema"."some_type";
> >      BEGIN
> >        SELECT current_setting('search_path') INTO "old_search_path";
> >        PERFORM set_config(
> >          'search_path',
> >          'some_schema, pg_temp, ' || "old_search_path",
> >          TRUE
> >        );
> > 
> >        ----------------------------------------------------------------
> >        -- Do I have to fully qualify types and operators from
> >        -- "myschema" here? Or is it safe to not fully qualify them?
 (correction: "some_schema")
> >        ----------------------------------------------------------------
> >      END;
> >    $$;
> > 
> > That is my overall idea.
> 
> Is 'some_schema' a known item when installing?

Yes, fortunately "some_schema" is a fixed name.

> 
> Once you have the search_path defined and assuming all the objects you 
> want are in that path, then yes you can drop the schema qualification.

That would be nice, but it doesn't seem to be the case. At least not
always. I constructed the following new example:

============

CREATE TABLE "tbl" ("col" NUMERIC(15, 0));

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
    RETURN '2.4';
  END;
$$;

BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO 'myschema';

CREATE TABLE "tbl" ("col" NUMERIC);

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
    RETURN '5.4';
  END;
$$;

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
  DECLARE
    "old_search_path" TEXT;
  BEGIN
    "old_search_path" := current_setting('search_path');
    SET LOCAL search_path TO "myschema";
    -- At this point, search_path is always set to 'myschema'!
    DECLARE
      "variable" "tbl"."col"%TYPE;
    BEGIN
      "variable" := "foo"();
      RETURN "variable";
    END;
    PERFORM set_config('search_path', "old_search_path", TRUE);
  END;
$$;

COMMIT;

SELECT "myschema"."run"(); -- returns '5.4' (when run in the same session)

-- reconnect to database here:
\c

SELECT "myschema"."run"(); -- returns '5'
SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5'

-- reconnect to database again:
\c

SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5.4'
SET search_path TO 'public';
SELECT "myschema"."run"(); -- returns '5.4'

============

Even if

DECLARE "variable" "tbl"."col"%TYPE;

follows *after* the schema is set to "myschema" in the example above, I
still get differing results, depending on how the search_path was set
when the function was first called.

I think this has to do with the fact that the overall structure and
probably types(?) are parsed first?

As Tom Lane wrote on Fri, 27 Dec 2024 16:03:17 -0500, "the types of
plpgsql variables are only looked up on the first use (within a
session)."

Does this apply to *all* types (e.g. types used in type-casts in
statements after BEGIN)? Or does it only apply to types in the DECLARE
section?

Maybe my most recent example is somewhat "crafted", but it makes me
feel insecure about what I can rely on. Could someone explain to me
what the exact rules are, or where to find them? I don't seem to
understand the exact behavior from reading the docs.

I re-read section 41.11.2. on Plan Caching:

"The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure, but individual SQL
expressions and SQL commands used in the function are not translated
immediately.
As each expression and SQL command is first executed in the function,
the PL/pgSQL interpreter parses and analyzes the command to create a
prepared statement, using the SPI manager's SPI_prepare function.
Subsequent visits to that expression or command reuse the prepared
statement. [...]"

It isn't specific about how DECLARE blocks are handled.

> 
> > 
> > My problem is that I'm confused about WHEN EXACTLY I have to qualify
> > tables/types, etc. It is very hard to understand from reading (just) the
> > documentation.
> 
> If you are doing this as an extension then I suspect you want the 
> processes shown here:
> 
> https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-RELOCATION

Yes, I'm aware of that, but the code defining the function is not part
of an extension in my case (only the RATIONAL type is part of an
extension). But thank you for pointing this out. I have been using
@extschema@ in extension code before, and recently also learned about
the @extschema:name@ syntax.

> -- 
> Adrian Klaver
> adrian.klaver@aklaver.com

Many thanks already for all your input.

Regards,
Jan Behrens



Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:

Even if

DECLARE "variable" "tbl"."col"%TYPE;

follows *after* the schema is set to "myschema" in the example above, I
still get differing results, depending on how the search_path was set
when the function was first called.

I think this has to do with the fact that the overall structure and
probably types(?) are parsed first?

I concur that this dynamic doesn’t seem to be discussed.  Namely that in the presence of nested blocks the parse phase resolves placeholders for all declared variables without executing any expressions in the body of the function; therefore all types will be resolved seeing the same search_path, namely that of the calling session or established using SET.  Changing the search_path within an outer function body block will not affect declarations within an inner block. (I am not sure whether the for-loop cases are exceptional in this.)

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:

CREATE FUNCTION "foo_impl"("query_p" TEXT, "search_path_p" TEXT)
  RETURNS "some_type"
  LANGUAGE plpgsql SET search_path FROM CURRENT AS $$
    DECLARE
      "old_search_path" TEXT;
      "result" "some_type";
    BEGIN
      "old_search_path" = current_setting('search_path');
      PERFORM set_config('search_path', "search_path_p", TRUE);
      EXECUTE "query_p" INTO "result";
      PERFORM set_config('search_path', "old_search_path", TRUE);
      RETURN "result";
    END;
  $$;

You might consider adding a polymorphic argument for the result type.  Then if you call the function with two different typed inputs it will be cached once for each.

 Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.”

David J.
 

Re: search_path for PL/pgSQL functions partially cached?

From
"David G. Johnston"
Date:
On Saturday, January 4, 2025, Jan Behrens <jbe-mlist@magnetkern.de> wrote:

I re-read section 41.11.2. on Plan Caching:

"The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure,

The type of a plpgsql variable is by definition its structure; established in a statement, so this is actually covered by that paragraph.  But I would be for adding a bit more specific terminology here.

David J.

Re: search_path for PL/pgSQL functions partially cached?

From
Laurenz Albe
Date:
On Sun, 2025-01-05 at 00:12 +0100, Jan Behrens wrote:
> I constructed the following new example:
>
> ============
>
> CREATE TABLE "tbl" ("col" NUMERIC(15, 0));
>
> CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
>   BEGIN
>     RETURN '2.4';
>   END;
> $$;
>
> BEGIN;
>
> CREATE SCHEMA "myschema";
> SET LOCAL search_path TO 'myschema';
>
> CREATE TABLE "tbl" ("col" NUMERIC);
>
> CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
>   BEGIN
>     RETURN '5.4';
>   END;
> $$;
>
> CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
>   DECLARE
>     "old_search_path" TEXT;
>   BEGIN
>     "old_search_path" := current_setting('search_path');
>     SET LOCAL search_path TO "myschema";
>     -- At this point, search_path is always set to 'myschema'!
>     DECLARE
>       "variable" "tbl"."col"%TYPE;
>     BEGIN
>       "variable" := "foo"();
>       RETURN "variable";
>     END;
>     PERFORM set_config('search_path', "old_search_path", TRUE);
>   END;
> $$;
>
> COMMIT;
>
> Even if
>
> DECLARE "variable" "tbl"."col"%TYPE;
>
> follows *after* the schema is set to "myschema" in the example above, I
> still get differing results, depending on how the search_path was set
> when the function was first called.

So what you should do is set the "search_path" *on* the function, not *in*
the function:

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql
SET search_path = myschema
AS $$
  DECLARE
    "variable" "tbl"."col"%TYPE;
  BEGIN
    "variable" := "foo"();
    RETURN "variable";
  END;
$$;

Yours,
Laurenz Albe



Re: search_path for PL/pgSQL functions partially cached?

From
Jan Behrens
Date:
On Sun, 05 Jan 2025 07:48:56 +0100
Laurenz Albe <laurenz.albe@cybertec.at> wrote:

> So what you should do is set the "search_path" *on* the function, not *in*
> the function:
> 
> CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql
> SET search_path = myschema
> AS $$
>   DECLARE
>     "variable" "tbl"."col"%TYPE;
>   BEGIN
>     "variable" := "foo"();
>     RETURN "variable";
>   END;
> $$;
> 
> Yours,
> Laurenz Albe

Yes, that's what works and what I would also do whenever possible
(probably in the form "SET search_path FROM CURRENT").

Summarizing the remaining thread, some issues are:

* The documentation isn't providing a prominent warning that behavior
  can be surprising if "SET search_path" is not used in the function's
  or procedure's defintion. (E.g. searching for "schema" in the
  documentation page for "CREATE FUNCTION" doesn't give any helpful
  hints or warning.)

* Things get more complicated when it's impossible to use
  "SET search_path" in the function's/procedure's definition, for which
  there are two scenarios:

  Scenario 1: The function or procedure needs or wants to access or use
              the search_path of the caller.

  Scenario 2: A procedure wants to execute transactional statements
              such as COMMIT or ROLLBACK within its body.

  In scenario 1, using "SET search_path" will overwrite the caller's
  search_path at runtime, so I cannot access it. (In my post from Sat,
  4 Jan 2025 14:23:10 +0100, I have proposed a wrapper function to work
  around that.)

  In scenario 2, using "SET search_path" is simply not possible and
  will be rejected by PostgreSQL.

* It is a bit unclear how the exact behavior is when I set a
  search_path from within the functions body (e.g. due to one of the two
  scenarios above). There are some examples that show some quite
  surprising behavior, at least if you don't fully understand the
  plan caching mechanism that is used.

Kind regards,
Jan Behrens