Thread: User's responsibility when using a chain of "immutable" functions?
I’ve copied my self-contained testcase at the end.
I create three functions, marking each of them "immutable". "f1()" simply returns the manifest constant 'dog'. So it seems perfectly honest to mark it as I did. "f2()" simply returns "f1()"—so, here too, it seems that I'm being honest. But I do see that I'm using human reasoning, and that Postgres cannot check that I'm right. In the same way, and with the same reasoning for my marking, "f3()" returns "f2()".
Then I do this:
select rpad('at start', 30) as history, f1(), f2(), f3();
\t off
drop function if exists f3() cascade;
drop function if exists f2() cascade;
drop function if exists f1() cascade;
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
create function f3()
returns text
immutable
language plpgsql
as $body$
begin
return f2();
end;
$body$;
select rpad('at start', 30) as history, f1(), f2(), f3();
\t on
drop function f1() cascade;
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'cat';
end;
$body$;
select rpad('after drop, re-create f1()', 30) as history, f1(), f2(), f3();
create or replace function f3()
returns text
immutable
language plpgsql
as $body$
declare
t1 constant text := f2();
begin
return t1;
end;
$body$;
select rpad('after create-and-replace f3()', 30) as history, f1(), f2(), f3();
Then I drop, and then re-create "f(1)", now returning 'cat', and do this:
select rpad('after drop, re-create f1()', 30) as history, f1(), f2(), f3();
Finally, I create-and-replace "f3()", using the identical source text, and do this:
select rpad('after create-and-replace f3()', 30) as history, f1(), f2(), f3();
Here's what I see when I run my .sql script:
history | f1 | f2 | f3
--------------------------------+-----+-----+-----
at start | dog | dog | dog
after drop, re-create f1() | cat | cat | dog
after create-and-replace f3() | cat | cat | cat
--------------------------------+-----+-----+-----
at start | dog | dog | dog
after drop, re-create f1() | cat | cat | dog
after create-and-replace f3() | cat | cat | cat
I understand that the possible session-duration caching that I allow with "immutable" doesn't guarantee that I'll get any caching. But I had expected a cascade purge on anything that was cashed on delete of "f1()".
Something seems odd to me: if I do my "select f1(), f2(), f3()" after dropping "f1()" (i.e. before re-creating it) then I get an ordinary error saying that "f1()" doesn't exist. So it seems that Postgres does understand the dynamic dependency chain—even when the result from "f3()" is cached. If I then recreate "f1()" to return 'cat', I get no error—but, same as in my straight-through test, "f3()" continues to return its cached (and now "wrong") result.
Should I simply understand that when I have such a dynamic dependency chain of "immutable" functions, and should I drop and re-create the function at the start of the chain, then all bets are off until I drop and re-create every function along the rest of the chain?
--------------------------------------------------------------------------------
-- testcase.sql
drop function if exists f3() cascade;
drop function if exists f2() cascade;
drop function if exists f1() cascade;
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
create function f3()
returns text
immutable
language plpgsql
as $body$
begin
return f2();
end;
$body$;
select rpad('at start', 30) as history, f1(), f2(), f3();
\t on
drop function f1() cascade;
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'cat';
end;
$body$;
select rpad('after drop, re-create f1()', 30) as history, f1(), f2(), f3();
create or replace function f3()
returns text
immutable
language plpgsql
as $body$
declare
t1 constant text := f2();
begin
return t1;
end;
$body$;
select rpad('after create-and-replace f3()', 30) as history, f1(), f2(), f3();
\t off
> On Jun 28, 2022, at 18:41, Bryn Llewellyn <bryn@yugabyte.com> wrote: > Should I simply understand that when I have such a dynamic dependency chain of "immutable" functions, and should I dropand re-create the function at the start of the chain, then all bets are off until I drop and re-create every functionalong the rest of the chain? Yes. You don't have to drop and recreate the functions, though. DISCARD PLANS handles it as well: xof=# create function f1() returns text as $$ begin return 'cat'; end $$ language plpgsql immutable; CREATE FUNCTION xof=# create function f2() returns text as $$ begin return f1(); end $$ language plpgsql immutable; CREATE FUNCTION xof=# create function f3() returns text as $$ begin return f2(); end $$ language plpgsql immutable; CREATE FUNCTION xof=# select f1(), f2(), f3(); f1 | f2 | f3 -----+-----+----- cat | cat | cat (1 row) xof=# drop function f1(); DROP FUNCTION xof=# create function f1() returns text as $$ begin return 'dog'; end $$ language plpgsql immutable; CREATE FUNCTION xof=# select f1(), f2(), f3(); f1 | f2 | f3 -----+-----+----- dog | dog | cat (1 row) xof=# discard plans; DISCARD PLANS xof=# select f1(), f2(), f3(); f1 | f2 | f3 -----+-----+----- dog | dog | dog (1 row) xof=# The contract on an immutable function is that it returns the same return value for particular input values regardless ofdatabase or system state: that is, it's a pure function. Changing the definition in such a way breaks the contract, soI don't think PostgreSQL needs to do heroics to accommodate that situation. (For example, changing the definition of animmutable function that's used in an expression index could corrupt the index.) If one's fixing a bug, then rolling outthe change in a controlled way is a reasonable requirement.
Re: User's responsibility when using a chain of "immutable" functions?
From
"David G. Johnston"
Date:
On Tue, Jun 28, 2022 at 7:03 PM Christophe Pettus <xof@thebuild.com> wrote:
> On Jun 28, 2022, at 18:41, Bryn Llewellyn <bryn@yugabyte.com> wrote:
> Should I simply understand that when I have such a dynamic dependency chain of "immutable" functions, and should I drop and re-create the function at the start of the chain, then all bets are off until I drop and re-create every function along the rest of the chain?
Yes.
You don't have to drop and recreate the functions, though. DISCARD PLANS handles it as well:
Specifically:
xof=# select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
cat | cat | cat
(1 row)
The pl/pgsql plan cache now contains the following:
SELECT f1() => 'cat'
SELECT f2() => 'cat'
xof=# drop function f1();
DROP FUNCTION
Now the cache only contains:
SELECT f2() => 'cat'
The f1 plan has been invalidated due to the drop/replace action on the f1 function
xof=# create function f1() returns text as $$ begin return 'dog'; end $$ language plpgsql immutable;
CREATE FUNCTION
xof=# select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
dog | dog | cat
(1 row)
And so f3()'s invocation of "SELECT f2()" yields 'cat' from the cache since that one hasn't been invalidated. While f2() replans its f1() invocation and thus returns 'dog'
The fundamental limitation here is that there really is no attempt being made to deal with inter-functional dependencies. Their bodies are blackboxes (...wonders how this resolves in the new SQL Standard Function Bodies implementation...) and no explicit dependency information is recorded either. So we don't know that the saved plan for f2() depends on a specific version of f1() and thus if f1() is changed plans involving f2() should be invalidated along with plans involving f1(). Nor is there sufficient recognized benefit to doing so.
David J.
david.g.johnston@gmail.com wrote:xof@thebuild.com wrote:bryn@yugabyte.com wrote:
Should I simply understand that when I have such a dynamic dependency chain of "immutable" functions, and should I drop and re-create the function at the start of the chain, then all bets are off until I drop and re-create every function along the rest of the chain?
Yes.You don't have to drop and recreate the functions, though. DISCARD PLANS handles it as well:Specifically:select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
cat | cat | cat
The pl/pgsql plan cache now contains the following:
SELECT f1() => 'cat'
SELECT f2() => 'cat'drop function f1();
Now the cache only contains:
SELECT f2() => 'cat'
The f1 plan has been invalidated due to the drop/replace action on the f1 functioncreate function f1() returns text as $$ begin return 'dog'; end $$ language plpgsql immutable;
select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
dog | dog | catAnd so f3()'s invocation of "SELECT f2()" yields 'cat' from the cache since that one hasn't been invalidated. While f2() replans its f1() invocation and thus returns 'dog'
The fundamental limitation here is that there really is no attempt being made to deal with inter-functional dependencies. Their bodies are blackboxes (...wonders how this resolves in the new SQL Standard Function Bodies implementation...) and no explicit dependency information is recorded either. So we don't know that the saved plan for f2() depends on a specific version of f1() and thus if f1() is changed plans involving f2() should be invalidated along with plans involving f1(). Nor is there sufficient recognized benefit to doing so.
DISCARD PLANS is unsafe in a multi-user concurrent scenario. The doc says explicitly that its scope is just the single session. And it's easy to show the danger by using my testcase manually, step by appropriate step, with two concurrent sessions.
However, you said (indirectly) that the session-duration caching is a red herring—and that the real danger comes with an expression-based index that involves a PL/pgSQL function. I agree.
PG's lack of dependency tracking shows up with just a "worker" function f1() and a "jacket" function f2() when you base the index on f2(). You can happily drop and recreate f1() with a new implementation while the index lives on. (For the reasons that we've mentioned, the "2BP01: cannot drop function... because other objects depend on it" error doesn't occur.)
I've concluded that the only practical practice for "immutable" is to reserve its use for functions that don't mention even a single user-created artifact.
Moreover, this "hermetic" property of a to-be-immutable function can be established only by human analysis of the function's source code.
st 29. 6. 2022 v 6:28 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal:
david.g.johnston@gmail.com wrote:xof@thebuild.com wrote:bryn@yugabyte.com wrote:
Should I simply understand that when I have such a dynamic dependency chain of "immutable" functions, and should I drop and re-create the function at the start of the chain, then all bets are off until I drop and re-create every function along the rest of the chain?
Yes.You don't have to drop and recreate the functions, though. DISCARD PLANS handles it as well:Specifically:select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
cat | cat | cat
The pl/pgsql plan cache now contains the following:
SELECT f1() => 'cat'
SELECT f2() => 'cat'drop function f1();
Now the cache only contains:
SELECT f2() => 'cat'
The f1 plan has been invalidated due to the drop/replace action on the f1 functioncreate function f1() returns text as $$ begin return 'dog'; end $$ language plpgsql immutable;
select f1(), f2(), f3();
f1 | f2 | f3
-----+-----+-----
dog | dog | catAnd so f3()'s invocation of "SELECT f2()" yields 'cat' from the cache since that one hasn't been invalidated. While f2() replans its f1() invocation and thus returns 'dog'
The fundamental limitation here is that there really is no attempt being made to deal with inter-functional dependencies. Their bodies are blackboxes (...wonders how this resolves in the new SQL Standard Function Bodies implementation...) and no explicit dependency information is recorded either. So we don't know that the saved plan for f2() depends on a specific version of f1() and thus if f1() is changed plans involving f2() should be invalidated along with plans involving f1(). Nor is there sufficient recognized benefit to doing so.
DISCARD PLANS is unsafe in a multi-user concurrent scenario. The doc says explicitly that its scope is just the single session. And it's easy to show the danger by using my testcase manually, step by appropriate step, with two concurrent sessions.However, you said (indirectly) that the session-duration caching is a red herring—and that the real danger comes with an expression-based index that involves a PL/pgSQL function. I agree.PG's lack of dependency tracking shows up with just a "worker" function f1() and a "jacket" function f2() when you base the index on f2(). You can happily drop and recreate f1() with a new implementation while the index lives on. (For the reasons that we've mentioned, the "2BP01: cannot drop function... because other objects depend on it" error doesn't occur.)I've concluded that the only practical practice for "immutable" is to reserve its use for functions that don't mention even a single user-created artifact.Moreover, this "hermetic" property of a to-be-immutable function can be established only by human analysis of the function's source code.
Our immutable functions are more tolerant than they should be - for real immutable functions we should disallow SQL inside functions (and everything that is not immutable (plpgsql_check raises warning in this case)), but it is allowed. On the second hand, it allows some very dirty tricks with the planner.
Regards
Pavel
Re: User's responsibility when using a chain of "immutable" functions?
From
"David G. Johnston"
Date:
On Tuesday, June 28, 2022, Bryn Llewellyn <bryn@yugabyte.com> wrote:
DISCARD PLANS is unsafe in a multi-user concurrent scenario. The doc says explicitly that its scope is just the single session. And it's easy to show the danger by using my testcase manually, step by appropriate step, with two concurrent sessions.
Impractical maybe, but hardly unsafe (well, relative to not doing anything).
I've concluded that the only practical practice for "immutable" is to reserve its use for functions that don't mention even a single user-created artifact.
That seems like too strong a position to take for me. Go ahead and build immutable utility functions. Just don’t be stupid break your promise. But even then, there are ways to fix things in case of bugs.
David J.
Pavel Stehule <pavel.stehule@gmail.com> writes: > st 29. 6. 2022 v 6:28 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal: >> Moreover, this "hermetic" property of a to-be-immutable function can be >> established only by human analysis of the function's source code. > Our immutable functions are more tolerant than they should be - for real > immutable functions we should disallow SQL inside functions (and everything > that is not immutable (plpgsql_check raises warning in this case)), but it > is allowed. It's generally believed that Turing's proof of the undecidability of the halting problem [1] implies that it's impossible to mechanically prove or refute function properties like immutability. Admittedly, Turing was concerned with the most general case --- that is, he showed that *there exist* cases for which no algorithm can give the right answer, not that any specific practical case can't be proven. Still, that result has discouraged most people from spending much time on mechanically checking such things. If you declare a function immutable, Postgres will believe you; the consequences if you lied are on your own head. regards, tom lane [1] https://en.wikipedia.org/wiki/Halting_problem
st 29. 6. 2022 v 7:46 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> st 29. 6. 2022 v 6:28 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal:
>> Moreover, this "hermetic" property of a to-be-immutable function can be
>> established only by human analysis of the function's source code.
> Our immutable functions are more tolerant than they should be - for real
> immutable functions we should disallow SQL inside functions (and everything
> that is not immutable (plpgsql_check raises warning in this case)), but it
> is allowed.
It's generally believed that Turing's proof of the undecidability of
the halting problem [1] implies that it's impossible to mechanically
prove or refute function properties like immutability. Admittedly,
Turing was concerned with the most general case --- that is, he showed
that *there exist* cases for which no algorithm can give the right
answer, not that any specific practical case can't be proven.
Still, that result has discouraged most people from spending much
time on mechanically checking such things. If you declare a function
immutable, Postgres will believe you; the consequences if you lied
are on your own head.
We cannot ensure that the function is immutable, but we can detect that the function is not very probably immutable (in execution time).
calling volatile function from immutable function
using SELECT from tables inside immutable function
This is a clear violation of some "protocol". I know why it is, and I don't propose change, because it can break thousands of applications. And for some specific cases the strong restrictivity can be safe but not practical.
regards, tom lane
[1] https://en.wikipedia.org/wiki/Halting_problem
Pavel Stehule <pavel.stehule@gmail.com> writes: > st 29. 6. 2022 v 7:46 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> ... that result has discouraged most people from spending much >> time on mechanically checking such things. If you declare a function >> immutable, Postgres will believe you; the consequences if you lied >> are on your own head. > We cannot ensure that the function is immutable, but we can detect that the > function is not very probably immutable (in execution time). Sure, there are a lot of easy cases where we could say "that's obviously not immutable". But is it worth spending engineering effort and runtime on that? I suspect the cases that people might actually mess up are less obvious, so that we might accomplish little more than offering a false sense of security. regards, tom lane
On Tue, 2022-06-28 at 19:02 -0700, Christophe Pettus wrote: > > On Jun 28, 2022, at 18:41, Bryn Llewellyn <bryn@yugabyte.com> wrote: > > Should I simply understand that when I have such a dynamic dependency > > chain of "immutable" functions, and should I drop and re-create the > > function at the start of the chain, then all bets are off until I drop > > and re-create every function along the rest of the chain? > > Yes. That is not enough in the general case. You are not allowed to redefine an IMMUTABLE function in a way that changes its behavior: CREATE FUNCTION const(integer) RETURNS integer LANGUAGE plpgsql IMMUTABLE AS 'BEGIN RETURN $1; END;'; CREATE TABLE t (x integer); INSERT INTO t VALUES (1); CREATE INDEX ON t (const(x)); SET enable_seqscan = off; SELECT * FROM t WHERE const(x) = 1; -- returns a correct result x ═══ 1 (1 row) CREATE OR REPLACE FUNCTION const(integer) RETURNS integer LANGUAGE plpgsql IMMUTABLE AS 'BEGIN RETURN $1 + 1; END;'; SELECT * FROM t WHERE const(x) = 1; -- returns a bad result x ═══ 1 (1 row) Of course, you are allowed to cheat if you know what you are doing. But any problem you encounter that way is your own problem entirely. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
st 29. 6. 2022 v 8:29 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> st 29. 6. 2022 v 7:46 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> ... that result has discouraged most people from spending much
>> time on mechanically checking such things. If you declare a function
>> immutable, Postgres will believe you; the consequences if you lied
>> are on your own head.
> We cannot ensure that the function is immutable, but we can detect that the
> function is not very probably immutable (in execution time).
Sure, there are a lot of easy cases where we could say "that's
obviously not immutable". But is it worth spending engineering
effort and runtime on that? I suspect the cases that people
might actually mess up are less obvious, so that we might
accomplish little more than offering a false sense of security.
This is a hard question. I know so many hard performance issues are related to missing STABLE or IMMUTABLE flags of some functions.
On second hand I am relatively happy with the current state and in warnings implemented in plpgsql_check. Unfortunately, only few users know so plpgsql_check exists.
I understand that implementation of this extra check can be very expensive. It can require handling exceptions everywhere, because you need to hold caller context everywhere. And it can have zero benefit, when all customer's functions have the default volatile flag. I have no idea how to implement it better, without significant performance impact.
Regards
Pavel
regards, tom lane
> On Jun 28, 2022, at 23:42, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > That is not enough in the general case. You are not allowed to redefine > an IMMUTABLE function in a way that changes its behavior [...] I think "not allowed" is putting it too strongly. It would be a bit much to ask that every single user-written immutablefunction be 100% perfect when it is rolled out, and never have to fix any bugs in them. However, you definitely*do* have to understand that there are administrative consequences for doing so, like rebuilding indexes and invalidatingsession caches. I think that the OP's statement that you can't ever use user-defined functions from an immutablefunction is too strong, too; you need to be aware of the consequences if you change an immutable function in a waythat alters the return result for a previously-valid set of arguments.
xof@thebuild.com wrote:
I think "not allowed" is putting it too strongly. It would be a bit much to ask that every single user-written immutable function be 100% perfect when it is rolled out, and never have to fix any bugs in them. However, you definitely *do* have to understand that there are administrative consequences for doing so, like rebuilding indexes and invalidating session caches. I think that the OP's statement that you can't ever use user-defined functions from an immutable function is too strong, too; you need to be aware of the consequences if you change an immutable function in a way that alters the return result for a previously-valid set of arguments.
My email that started this discussion has prompted a lot of answers in a few branches. This means that it's too hard for me to respond carefully to everything that's been said. But it does seem that there are a few different opinions about how safety might be ensured in the face of wrong results risks and what, therefore, might define proper practice.
I realize, now, that I didn't distinguish between:
(1) What you might do (with some caution and attention to detail) in the development shop; and
(2) What you might do when patching the database backend of a deployed production system.
Case #2 is arguably clear cut—as long as you accept that there's no such thing as safe hot patching (unless it's Oracle Database and you have edition-based redefinition). So, accepting this, you have to quiesce the system and make all your changes in a self-imposed single-session fashion. Of course, you have to pay attention to expression-based indexes. But anyone who adopts my maximally cautious approach of allowing only hermetic "immutable" functions and who never uses "create or replace" will find that the index safety risk looks after itself.
And case #1 is arguably less of a concern—unless it compromises your regression testing.
Anyway...
PG has a very lightweight scheme for dependencies that tracks just a few cases—like the dependence of an expression-based index that references a user-defined function upon that function. But, significantly, function-upon-function dependence (like my testcase showed) is not tracked. This is a non-negotiable fundamental property of PG.
It's worth noting that (at least as I have understood it) the "immutable" notion is simply a hint that gives PG permission to cache results rather than to calculate them afresh on every access. And that this is a device that seeks to improve performance. But significantly, there's no robust cache invalidation scheme (and nor could there be) so this leaves it to the user to promise safety.
There's no possible discussion about the trade-off between performance and correctness. So this argues for, at least, huge caution when you think that you might mark a function "immutable". Having said this, I noted this from pavel.stehule@gmail.com:
I know so many hard performance issues are related to missing STABLE or IMMUTABLE flags of some functions.
A caveat is needed because you're not allowed to reference a user-defined function in an expression-based index unless it's marked "immutable". But this ultimately is no more than a formal prick to the user's conscience. Nothing stops you from lying through your teeth in this scenario.
It was all these considerations that led me to my proposal for *my own* practice:
(1) Never use "create or replace" to change an "immutable" function—but, rather always use "drop" and a fresh bare "create".
(2) Never mark a function "immutable" unless its behavior is determined entirely by its own source text. (I'll say this as "unless the function is hermetic".) This notion accommodates use of some built-in functions (but even there, caution is needed because of how session settings can affect the behavior of some built-ins) but it most certainly prohibits any reference to user-defined artifacts (including to other "immutable" functions.) It also prohibits catalog queries.
Both david.g.johnston@gmail.com and xof@thebuild.com have argued that my stance is overly cautious. Fair enough. They can make their own minds up. But what convinces me is the complete lack of static dependencies and the infeasibility of discovering all dynamic dependencies by exhaustive human analysis of source text when the only help you have is a catalog query to identify all "immutable" functions in a database and another to identify all indexes that reference a function. (But here, as far as I can tell, you need human inspection to spot the subset that are user-defined functions.)
One more thing...
I see now that I didn't clearly describe an outcome shown by the testcase that I copied in my email that started this thread. Here's a sufficient, shortened, version. The setup is what I showed you before.
When I drop the first link, f1() in the chain of "immutable" functions, I see that I cannot invoke f(2) because it now complains that f1() doesn't exist. This surprises me because, after all, the result of f2() is now cached (at least as I suppose) and its body isn't executed to produce the result. This outcome almost suggests that there is, after all, a dependency tracking scheme at work.
Yet I can still invoke the third link, f(3), and it still does produce the value that it had cached!
This just tells me that it's stupid to try to deduce the intended behavior of a software system by empirical testing. (It's also hard to deduce what's intended by reading the source code.)
--------------------------------------------------------------------------------
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
create function f3()
returns text
immutable
language plpgsql
as $body$
begin
return f2();
end;
$body$;
select f1(), f2(), f3();
drop function f1() cascade;
select f2(); -- errors with "function f1() does not exist"
select f3(); -- happily returns 'dog'
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
create function f3()
returns text
immutable
language plpgsql
as $body$
begin
return f2();
end;
$body$;
select f1(), f2(), f3();
drop function f1() cascade;
select f2(); -- errors with "function f1() does not exist"
select f3(); -- happily returns 'dog'
Re: User's responsibility when using a chain of "immutable" functions?
From
"David G. Johnston"
Date:
On Wednesday, June 29, 2022, Bryn Llewellyn <bryn@yugabyte.com> wrote:
When I drop the first link, f1() in the chain of "immutable" functions, I see that I cannot invoke f(2) because it now complains that f1() doesn't exist. This surprises me because, after all, the result of f2() is now cached (at least as I suppose) and its body isn't executed to produce the result. This outcome almost suggests that there is, after all, a dependency tracking scheme at work.Yet I can still invoke the third link, f(3), and it still does produce the value that it had cached!
The “cache” is just a prepared statement plan. You didn’t create any of those yourself at the top SQL context so you don’t see caching effects in the stuff you execute in SQL directly.
Pl/pgsql, however, creates prepared statement plans for any internal SQL it executes (i.e., it compiles the function). That is the caching artefact you are seeing and why I mentioned pl/pgsql in my reply where I described why you saw the results you did.
IOW, when you write : “ select f2(); “ in SQL f2() is always called, it is never durectly replaced with a cached value. The documentation does say this though I lack the relevant paragraph reference at the moment.
David J.
david.g.johnston@gmail.com wrote:bryn@yugabyte.com wrote:
When I drop the first link, f1() in the chain of "immutable" functions, I see that I cannot invoke f(2) because it now complains that f1() doesn't exist. This surprises me because, after all, the result of f2() is now cached (at least as I suppose) and its body isn't executed to produce the result. This outcome almost suggests that there is, after all, a dependency tracking scheme at work.
Yet I can still invoke the third link, f(3), and it still does produce the value that it had cached!
The “cache” is just a prepared statement plan. You didn’t create any of those yourself at the top SQL context so you don’t see caching effects in the stuff you execute in SQL directly.
PL/pgSQL, however, creates prepared statement plans for any internal SQL it executes (i.e., it compiles the function). That is the caching artefact you are seeing and why I mentioned pl/pgsql in my reply where I described why you saw the results you did.
Thanks again. Got this, too, now.
IOW, when you write : "select f2();" in SQL, f2() is always called, it is never directly replaced with a cached value. The documentation does say this though I lack the relevant paragraph reference at the moment.
Ah... but where is it!
Thanks, David. It's clear that my mental model has been missing at least one critical piece. (My instincts have been conditioned by too many years with Oracle Database to realize when I'm making some wrong assumption in a PG context.)
Anyway... armed with this new knowledge, I can write a much simpler demo of this risk brought when an "ummutable" function calls another with just two function.
I copied this below. But then I made a slight change. And this brought an error that I can't explain. Any ideas?
Meanwhile. I'll appeal for some pointers to what I should read:
I *had* understood that the SQL that a user-created subprogram issues (at least for "language sql" and "language plpgsql") is implicitly prepared. But I've no idea what it uses as the "handle" for such a prepared statement. Might it be, for example, a node in the AST that represents the subprogram or anonymous block in my session? In the same way, I've no idea what the outcome is when two different subprograms issue the identical (or identical post-canonicalization) SQL statement text. I don't know how to search the PG doc to find the explanations that I need. For example "pl/pgsql execution model" gets just a single hit in in a piece about locking. Wider Internet search gets too much noise, and too much unreliable suff from self-appointed experts, to be of much help. I was excited to find "Plpgsql_internals.pdf" by pavel.stehule@gmail.com. But disappointed to find that it didn't answer my questions (and nor could it when it explains things w.r.t the C implementation).
--------------------------------------------------------------------------------
I *had* understood that the SQL that a user-created subprogram issues (at least for "language sql" and "language plpgsql") is implicitly prepared. But I've no idea what it uses as the "handle" for such a prepared statement. Might it be, for example, a node in the AST that represents the subprogram or anonymous block in my session? In the same way, I've no idea what the outcome is when two different subprograms issue the identical (or identical post-canonicalization) SQL statement text. I don't know how to search the PG doc to find the explanations that I need. For example "pl/pgsql execution model" gets just a single hit in in a piece about locking. Wider Internet search gets too much noise, and too much unreliable suff from self-appointed experts, to be of much help. I was excited to find "Plpgsql_internals.pdf" by pavel.stehule@gmail.com. But disappointed to find that it didn't answer my questions (and nor could it when it explains things w.r.t the C implementation).
Does anybody have any recommendations for what I might study?
Of course, I started with the account of "immutable" in the "create function" doc:
«
IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
Of course, I started with the account of "immutable" in the "create function" doc:
«
IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
»
"immediately replaced with the function value" implies a cache. But not when it's populated (and when it isn't) or what its lifetime might be. It certainly doesn't mention cache invalidation.
Of course, I read the account in the "prepare" doc too. That says nothing to the effect that the result of prepare, when the SQL statement references an "immutable" function, is to build a cache whose key is the vector of input actuals and whose payload is the corresponding return value. Nor does it mention the cache's capacity and what happens when (if) the cache becomes full.
About your comment thus:
"select f2();" in (explicit) SQL, f2() is always called, it is never directly replaced with a cached value
I suppose that this can be deduced from the fact that the cache mechanism is the prepared statement (not that this latter point is spelled out).
--------------------------------------------------------------------------------
-- TEST ONE. FITS MY MENTAL MODEL. NICELY SHOWS THE "WRONG RESULTS" RISK.
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
prepare q1 as select f1(), f2();
execute q1; --------------------------------<< Gets "dog | dog"
prepare q2 as select f2();
execute q2; --------------------------------<< Gets "dog"
/*
Presumably dropping f1() invalidates q1 but
leaves its definition intact so that it can later be re-vaildated
when f1() exists again.
*/;
drop function f1() cascade;
execute q2; --------------------------------<< Still gets "dog"
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'cat';
end;
$body$;
execute q1; --------------------------------<< Now gets "cat | cat"
execute q2; --------------------------------<< Still gets "dog"
returns text
immutable
language plpgsql
as $body$
begin
return 'dog';
end;
$body$;
create function f2()
returns text
immutable
language plpgsql
as $body$
begin
return f1();
end;
$body$;
prepare q1 as select f1(), f2();
execute q1; --------------------------------<< Gets "dog | dog"
prepare q2 as select f2();
execute q2; --------------------------------<< Gets "dog"
/*
Presumably dropping f1() invalidates q1 but
leaves its definition intact so that it can later be re-vaildated
when f1() exists again.
*/;
drop function f1() cascade;
execute q2; --------------------------------<< Still gets "dog"
create function f1()
returns text
immutable
language plpgsql
as $body$
begin
return 'cat';
end;
$body$;
execute q1; --------------------------------<< Now gets "cat | cat"
execute q2; --------------------------------<< Still gets "dog"
--------------------------------------------------------------------------------
-- TEST TWO. BREAKS MY MENTAL MODEL.
-- ALL I DID WAS CHANGE f1() AND f2() TO HAVE A TEXT PARAMETER
-- AND TO USE "lower()" IN THE BODY OF f1(),
create function f1(t in text)
returns text
immutable
language plpgsql
as $body$
begin
return lower(t);
end;
$body$;
create function f2(t in text)
returns text
immutable
language plpgsql
as $body$
begin
return f1(t);
end;
$body$;
prepare q1(text) as select f1($1), f2($1);
execute q1('DOG'); -------------------------<< Gets "dog | dog"
prepare q2(text) as select f2($1);
execute q2('CAT'); -------------------------<< Gets "cat"
drop function f1(text) cascade;
-- Now fails with "function f1(text) does not exist"
execute q2('CAT');
returns text
immutable
language plpgsql
as $body$
begin
return lower(t);
end;
$body$;
create function f2(t in text)
returns text
immutable
language plpgsql
as $body$
begin
return f1(t);
end;
$body$;
prepare q1(text) as select f1($1), f2($1);
execute q1('DOG'); -------------------------<< Gets "dog | dog"
prepare q2(text) as select f2($1);
execute q2('CAT'); -------------------------<< Gets "cat"
drop function f1(text) cascade;
-- Now fails with "function f1(text) does not exist"
execute q2('CAT');
Re: User's responsibility when using a chain of "immutable" functions?
From
"David G. Johnston"
Date:
On Wed, Jun 29, 2022 at 5:03 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
Meanwhile. I'll appeal for some pointers to what I should read:
I *had* understood that the SQL that a user-created subprogram issues (at least for "language sql" and "language plpgsql") is implicitly prepared. But I've no idea what it uses as the "handle" for such a prepared statement. Might it be, for example, a node in the AST that represents the subprogram or anonymous block in my session? In the same way, I've no idea what the outcome is when two different subprograms issue the identical (or identical post-canonicalization) SQL statement text. I don't know how to search the PG doc to find the explanations that I need. For example "pl/pgsql execution model" gets just a single hit in in a piece about locking.
I tend not to search...or at least that isn't my first (or at least only) recourse.
The pg/pgsql chapter has a subchapter named "Plan Caching":
Wider Internet search gets too much noise, and too much unreliable suff from self-appointed experts, to be of much help. I was excited to find "Plpgsql_internals.pdf" by pavel.stehule@gmail.com. But disappointed to find that it didn't answer my questions (and nor could it when it explains things w.r.t the C implementation).Does anybody have any recommendations for what I might study?
Of course, I started with the account of "immutable" in the "create function" doc:
«
IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.»"immediately replaced with the function value" implies a cache. But not when it's populated (and when it isn't) or what its lifetime might be. It certainly doesn't mention cache invalidation.
You really need to read the "see related" reference there to get the level of detail that you want:
"This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments."
The implication is that this operation is not session-scoped but query-scoped.
Other parts of the page reinforce this. Not saying it is perfect wording but I came by my understanding pretty much exclusively from this documentation.
/*
Presumably dropping f1() invalidates q1 but
leaves its definition intact so that it can later be re-vaildated
when f1() exists again.
*/;
drop function f1() cascade;
execute q2; --------------------------------<< Still gets "dog"
I think my cache example was misleading...
for f2():
LINE 2 in text: SELECT f1();
LINE 2 in the compiled code: SELECT 'dog'; -- no input arg so replace the call with its constant return value
---------------------------------------------------------------------------------- TEST TWO. BREAKS MY MENTAL MODEL.-- ALL I DID WAS CHANGE f1() AND f2() TO HAVE A TEXT PARAMETER
See below - a "this is all I did" is totally insufficient.
-- Now fails with "function f1(text) does not exist"
execute q2('CAT');
f2(text) cannot do anything because the return result, while immutable, depends upon the value of "t".
Plausibly you might be able to produce something like:
select f2('DOG') => 'dog'
redefine f1(test); return 'cat'
select f2('DOG') => 'dog'
DISCARD ALL
select f2('DOG') => 'cat'
But in short extrapolating from a zero-argument scenario to a one-argument scenario makes no sense. There are many more things to worry about, and thus more reason to not optimize, when an input argument is involved. Specifically, it is unlikely to be worth doing anything except within the scope of a single query.
David J.
david.g.johnston@gmail.com wrote:
bryn@yugabyte.com wrote:
Meanwhile. I'll appeal for some pointers to what I should read...
I tend not to search...or at least that isn't my first (or at least only) recourse. The pg/pgsql chapter has a subchapter named "Plan Caching":
https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
You really need to read the "see related" reference there to get the level of detail that you want:
https://www.postgresql.org/docs/current/xfunc-volatility.html
"This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments."
The implication is that this operation is not session-scoped but query-scoped. Other parts of the page reinforce this. Not saying it is perfect wording but I came by my understanding pretty much exclusively from this documentation.
Thank you very much for the doc pointers, David. I believe that I have all I need, now. I understood already that "giving permission to cache" doesn't mean that PG will actually cache anything. I wanted only to find a compelling example of how lying when you mark a function "immutable" can bring wring results. I think that this is sufficient:
set x.a = '13';
create function dishonestly_marked_immutable(i in int)
returns int
immutable
language plpgsql
as $body$
begin
return i*(current_setting('x.a')::int);
end;
$body$;
prepare q as
select
dishonestly_marked_immutable(2) as "With actual '2'",
dishonestly_marked_immutable(3) as "With actual '3'";
execute q;
set x.a = '19';
execute q; ------------------<< Produces the stale "26 | 39".
discard plans;
execute q; ------------------<< Now produces the correct "38 | 57"
create function dishonestly_marked_immutable(i in int)
returns int
immutable
language plpgsql
as $body$
begin
return i*(current_setting('x.a')::int);
end;
$body$;
prepare q as
select
dishonestly_marked_immutable(2) as "With actual '2'",
dishonestly_marked_immutable(3) as "With actual '3'";
execute q;
set x.a = '19';
execute q; ------------------<< Produces the stale "26 | 39".
discard plans;
execute q; ------------------<< Now produces the correct "38 | 57"