Thread: When IMMUTABLE is not.
Good day, hackers. I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure function doesn't manipulate data. In fact, SPI checks only direct function kind, but fails to check indirect call. Attached immutable_not.sql creates 3 functions: - `immutable_direct` is IMMUTABLE and tries to insert into table directly. PostgreSQL correctly detects and forbids this action. - `volatile_direct` is VOLATILE and inserts into table directly. It is allowed and executed well. - `immutable_indirect` is IMMUTABLE and calls `volatile_direct`. PostgreSQL failed to detect and prevent this DML manipulation. Output: select immutable_direct('immutable_direct'); psql:immutable_not.sql:28: ERROR: INSERT is not allowed in a non-volatile function CONTEXT: SQL statement "insert into xxx values(j)" PL/pgSQL function immutable_direct(character varying) line 3 at SQL statement select volatile_direct('volatile_direct'); volatile_direct ----------------- volatile_direct (1 row) select immutable_indirect('immutable_indirect'); immutable_indirect -------------------- immutable_indirect (1 row) select * from xxx; i -------------------- volatile_direct immutable_indirect (2 rows) Attached forbid-non-volatile-mutations.diff add checks readonly function didn't made data manipulations. Output for patched version: select immutable_indirect('immutable_indirect'); psql:immutable_not.sql:32: ERROR: Damn2! Update were done in a non-volatile function CONTEXT: SQL statement "SELECT volatile_direct(j)" PL/pgSQL function immutable_indirect(character varying) line 3 at PERFORM I doubt check should be done this way. This check is necessary, but it should be FATAL instead of ERROR. And ERROR should be generated at same place, when it is generated for `immutable_direct`, but with check of "read_only" status through whole call stack instead of just direct function kind. ----- regards, Yura Sokolov Postgres Professional
Attachment
Sorry, previous message were smashed for some reason. I'll try to repeat I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure function doesn't manipulate data. In fact, SPI checks only direct function kind, but fails to check indirect call. Attached immutable_not.sql creates 3 functions: - `immutable_direct` is IMMUTABLE and tries to insert into table directly. PostgreSQL correctly detects and forbids this action. - `volatile_direct` is VOLATILE and inserts into table directly. It is allowed and executed well. - `immutable_indirect` is IMMUTABLE and calls `volatile_direct`. PostgreSQL failed to detect and prevent this DML manipulation. Output: select immutable_direct('immutable_direct'); psql:immutable_not.sql:28: ERROR: INSERT is not allowed in a non-volatile function CONTEXT: SQL statement "insert into xxx values(j)" PL/pgSQL function immutable_direct(character varying) line 3 at SQL statement select volatile_direct('volatile_direct'); volatile_direct ----------------- volatile_direct (1 row) select immutable_indirect('immutable_indirect'); immutable_indirect -------------------- immutable_indirect (1 row) select * from xxx; i -------------------- volatile_direct immutable_indirect (2 rows) Attached forbid-non-volatile-mutations.diff add checks readonly function didn't made data manipulations. Output for patched version: select immutable_indirect('immutable_indirect'); psql:immutable_not.sql:32: ERROR: Damn2! Update were done in a non-volatile function CONTEXT: SQL statement "SELECT volatile_direct(j)" PL/pgSQL function immutable_indirect(character varying) line 3 at PERFORM I doubt check should be done this way. This check is necessary, but it should be FATAL instead of ERROR. And ERROR should be generated at same place, when it is generated for `immutable_direct`, but with check of "read_only" status through whole call stack instead of just direct function kind. ----- regards, Yura Sokolov Postgres Professional
On Thu, 2023-06-15 at 13:22 +0300, Yura Sokolov wrote: > Good day, hackers. > > I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure > function doesn't manipulate data. > > [...] > > + errmsg("Damn1! Update were done in a non-volatile function"))); I think it is project policy to start error messages with a lower case character. Yours, Laurenz Albe
Yura Sokolov <y.sokolov@postgrespro.ru> writes: > I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure > function doesn't manipulate data. Of course not. It is the user's responsibility to mark functions properly. Trying to enforce that completely is a fool's errand; you soon get into trying to solve the halting problem. I don't like anything about the proposed patch. It's necessarily only a partial solution, and it probably breaks cases that are perfectly safe in context. regards, tom lane
15.06.2023 16:21, Tom Lane wrote: > Yura Sokolov <y.sokolov@postgrespro.ru> writes: >> I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure >> function doesn't manipulate data. > Of course not. It is the user's responsibility to mark functions > properly. Trying to enforce that completely is a fool's errand https://github.com/postgres/postgres/commit/b2c4071299e02ed96d48d3c8e776de2fab36f88c.patch https://github.com/postgres/postgres/commit/cdf8b56d5463815244467ea8f5ec6e72b6c65a6c.patch
On 2023-06-15 09:21, Tom Lane wrote: > Yura Sokolov <y.sokolov@postgrespro.ru> writes: >> not enough to be sure function doesn't manipulate data. > > Of course not. It is the user's responsibility to mark functions > properly. And also, isn't it the case that IMMUTABLE should mark a function, not merely that "doesn't manipulate data", but whose return value doesn't depend in any way on data (outside its own arguments)? The practice among PLs of choosing an SPI readonly flag based on the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of peculiar heuristic, not something inherent in what that declaration means to the optimizer. (And also influences what snapshot the function is looking at, and therefore what it can see, which has also struck me more as a tacked-on effect than something inherent in the declaration's meaning.) Regards, -Chap
15.06.2023 16:58, chap@anastigmatix.net пишет: > On 2023-06-15 09:21, Tom Lane wrote: >> Yura Sokolov <y.sokolov@postgrespro.ru> writes: >>> not enough to be sure function doesn't manipulate data. >> >> Of course not. It is the user's responsibility to mark functions >> properly. > > And also, isn't it the case that IMMUTABLE should mark a function, > not merely that "doesn't manipulate data", but whose return value > doesn't depend in any way on data (outside its own arguments)? > > The practice among PLs of choosing an SPI readonly flag based on > the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of > peculiar heuristic, not something inherent in what that declaration > means to the optimizer. (And also influences what snapshot the > function is looking at, and therefore what it can see, which has > also struck me more as a tacked-on effect than something inherent > in the declaration's meaning.) Documentation disagrees: https://www.postgresql.org/docs/current/sql-createfunction.html#:~:text=IMMUTABLE%0ASTABLE%0AVOLATILE > |IMMUTABLE|indicates that the function cannot modify the database and always returns the same result when given the same argument values > |STABLE|indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. > |VOLATILE|indicates that the function value can change even within a single table scan, so no optimizations can be made... But note that any function that has side-effects must be classified volatile, even if its result is quite predictable, to prevent calls from being optimized away
chap@anastigmatix.net writes: > And also, isn't it the case that IMMUTABLE should mark a function, > not merely that "doesn't manipulate data", but whose return value > doesn't depend in any way on data (outside its own arguments)? Right. We can't realistically enforce that either, so it's up to the user. > The practice among PLs of choosing an SPI readonly flag based on > the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of > peculiar heuristic, not something inherent in what that declaration > means to the optimizer. (And also influences what snapshot the > function is looking at, and therefore what it can see, which has > also struck me more as a tacked-on effect than something inherent > in the declaration's meaning.) Well, it is a bit odd at first sight, but these properties play together well. See https://www.postgresql.org/docs/current/xfunc-volatility.html regards, tom lane
On 2023-06-15 09:58, chap@anastigmatix.net wrote: > also influences what snapshot the > function is looking at, and therefore what it can see, which has > also struck me more as a tacked-on effect than something inherent > in the declaration's meaning. I just re-read that and realized I should anticipate the obvious response "but how can it matter what the function can see, if it's IMMUTABLE and depends on no data?". So, I ran into the effect while working on PL/Java, where the code of a function isn't all found in pg_proc.prosrc; that just indicates what code has to be fetched from sqlj.jar_entry. So one could take a strict view that "no PL/Java function should ever be marked IMMUTABLE" because every one depends on fetching something (once, at least). But on the other hand, it would seem punctilious to say that f(int x, int y) { return x + y; } isn't IMMUTABLE, only because it depends on a fetch /of its own implementation/, and overall its behavior is better described by marking it IMMUTABLE. Regards, -Chap
On Thursday, June 15, 2023, <chap@anastigmatix.net> wrote:
So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).
The failure to find and execute the function code itself is not a failure mode that these markers need be concerned with. Assuming one can execute the function an immutable function will give the same answer for the same input for all time.
David J.
On 2023-06-15 10:19, David G. Johnston wrote: > The failure to find and execute the function code itself is not a > failure > mode that these markers need be concerned with. Assuming one can > execute > the function an immutable function will give the same answer for the > same > input for all time. That was the view I ultimately took, and just made PL/Java suppress that SPI readonly flag when going to look for the function code. Until that change, you could run into the not-uncommon situation where you've just loaded a jar of new functions and try to use them in the same transaction, and hey presto, the VOLATILE ones all work, and the IMMUTABLE ones aren't there yet. Regards, -Chap
"David G. Johnston" <david.g.johnston@gmail.com> writes: > The failure to find and execute the function code itself is not a failure > mode that these markers need be concerned with. Assuming one can execute > the function an immutable function will give the same answer for the same > input for all time. The viewpoint taken in the docs I mentioned is that an IMMUTABLE marker is a promise from the user to the system about the behavior of a function. While the system does provide a few simple tools to catch obvious errors and to make it easier to write functions that obey such promises, it's mostly on the user to get it right. In particular, we've never enforced that an immutable function can't call non-immutable functions. While that would seem like a good idea in the abstract, we've intentionally not tried to do it. (I'm pretty sure there is more than one round of previous discussions of the point in the archives, although locating relevant threads seems hard.) One reason not to is that polymorphic functions have to be marked with worst-case volatility labels. There are plenty of examples of functions that are stable for some input types and immutable for others (array_to_string, for instance); but the marking system can't represent that so we have to label them stable. Enforcing that a user-defined immutable function can't use such a function might just break things for no gain. regards, tom lane
On Thu, 15 Jun 2023 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In particular, we've never enforced that an immutable function can't
call non-immutable functions. While that would seem like a good idea
in the abstract, we've intentionally not tried to do it. (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels. There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable. Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.
More sophisticated type systems (which I am *not* volunteering to graft onto Postgres) can handle some of this, but even Haskell has unsafePerformIO. The current policy is both wise and practical.
15.06.2023 17:49, Tom Lane пишет: > "David G. Johnston" <david.g.johnston@gmail.com> writes: >> The failure to find and execute the function code itself is not a failure >> mode that these markers need be concerned with. Assuming one can execute >> the function an immutable function will give the same answer for the same >> input for all time. > The viewpoint taken in the docs I mentioned is that an IMMUTABLE > marker is a promise from the user to the system about the behavior > of a function. While the system does provide a few simple tools > to catch obvious errors and to make it easier to write functions > that obey such promises, it's mostly on the user to get it right. > > In particular, we've never enforced that an immutable function can't > call non-immutable functions. While that would seem like a good idea > in the abstract, we've intentionally not tried to do it. (I'm pretty > sure there is more than one round of previous discussions of the point > in the archives, although locating relevant threads seems hard.) > One reason not to is that polymorphic functions have to be marked > with worst-case volatility labels. There are plenty of examples of > functions that are stable for some input types and immutable for > others (array_to_string, for instance); but the marking system can't > represent that so we have to label them stable. Enforcing that a > user-defined immutable function can't use such a function might > just break things for no gain. "Stable vs Immutable" is much lesser problem compared to "ReadOnly vs Volatile". Executing fairly read-only function more times than necessary (or less times), doesn't modify data in unexpecting way. But executing immutable/stable function, that occasionally modifies data, could lead to different unexpected effects due to optimizer decided to call them more or less times than query assumes. Some vulnerabilities were present due to user defined functions used in index definitions started to modify data. If "read-only" execution were forced in index operations, those issues couldn't happen. > it's mostly on the user to get it right. It is really bad premise. Users does strange things and aren't expected to be professionals who really understand whole PostgreSQL internals. And it is strange to hear it at the same time we don't allow users to do query hints since "optimizer does better" :-D Ok, I'd go and cool myself. Certainly I don't get some point.