Thread: Tightening behaviour for non-immutable behaviour in immutable functions
Recently I had someone complaining about a pg_restore failure and I believe we semi-regularly get complaints that are similar -- though I'm having trouble searching for them because the keywords "dump restore failure" are pretty generic. The root cause here -- and I believe for a lot of users -- are functions that are declared IMMUTABLE but are not for reasons that aren't obvious to the user. Indeed poking at this more carefully I think it's downright challenging to write an IMMUTABLE function at all. I suspect *most* users, perhaps even nearly all users, who write functions intending them to be immutable are actually not really as successful as they believe. The biggest culprit is of course search_path. Afaics it's nigh impossible to write any non-trivial immutable function without just setting the search_path GUC on the function. And there's nothing Postgres that requires that. I don't even see anything in the docs recommending it. Many users probably always run with the same search_path so in practice they're probably mostly safe. But one day they could insert data with a different search path with a different function definition in their path and corrupt their index which would be.... poor... Or as in my user they could discover the problem only in the middle of an upgrade which is a terrible time to discover it. I would suggest we should probably at the very least warn users if they create an immutable function that doesn't have search_path set for the function. I would actually prefer to make it an error but that brings in compatibility issues. Perhaps it could be optional. But putting a GUC on a function imposes a pretty heavy performance cost. I'm not sure how bad it is compared to running plpgsql code let alone other languages but IIUC it invalidates some catalog caches which for something happening repeatedly in, e.g. a data load would be pretty bad. It would be nice to have a way to avoid the performance cost and I see two options. Thinking of plpgsql here, we already run the raw parser on all sql when the function is defined. We could somehow check whether the raw_parser found any non-schema-qualified references. This looks like it would be awkward but doable. That would allow users to write non-search_path-dependent code and if postgres doesn't warn they would know they've done it properly. It would still put quite a burden on users, especially when it comes to operators... Or alternatively we could offer lexical scoping so that all objects are looked up at parse time and the fully qualified reference is stored instead of the non-qualified reference. That would be more similar to how views and other object references are handled. I suppose there's a third option that we could provide something which instead of *setting* the guc when a function is entered just verifies that guc is set as expected. That way the function would simply throw an error if search_path is "incorrect" and not have to invalidate any caches. That would at least avoid index corruption but not guarantee dump/reload would work. -- greg
> On Jun 8, 2022, at 2:42 PM, Greg Stark <stark@mit.edu> wrote: > > Thinking of plpgsql here, we already run the raw parser on all sql > when the function is defined. We could somehow check whether the > raw_parser found any non-schema-qualified references. This looks like > it would be awkward but doable. That would allow users to write > non-search_path-dependent code and if postgres doesn't warn they would > know they've done it properly. It would still put quite a burden on > users, especially when it comes to operators... > > Or alternatively we could offer lexical scoping so that all objects > are looked up at parse time and the fully qualified reference is > stored instead of the non-qualified reference. That would be more > similar to how views and other object references are handled. I like the general idea, but I'm confused why you are limiting the analysis to search path resolution. The following isclearly wrong, but not for that reason: create function public.identity () returns double precision as $$ select random()::integer; $$ language sql immutable parallel safe -- set search_path to 'pg_catalog' ; Uncommenting that last bit wouldn't make it much better. Isn't the more general approach to look for non-immutable (or non-stable) operations, with object resolution just one typeof non-immutable operation? Perhaps raise an error when you can prove the given function's provolatile marking is wrong,and a warning when you cannot prove the marking is correct? That would tend to give warnings for polymorphic functionsthat use functions or operators over the polymorphic types, or which use dynamic sql, but maybe that's ok. Thosefunctions probably deserve closer scrutiny anyway. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 8 Jun 2022 at 19:39, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > > I like the general idea, but I'm confused why you are limiting the analysis to search path resolution. The following isclearly wrong, but not for that reason: > > create function public.identity () returns double precision as $$ > select random()::integer; Well.... I did originally think it would be necessary to consider cases like this. (or even just cases where you call a user function that is not immutable because of search_path dependencies). But there are two problems: Firstly, that would be a lot harder to implement. We don't actually do any object lookups in plpgsql when defining plpgsql functions. So this would be a much bigger change. But secondly, there are a lot of cases where non-immutable functions *are* immutable if they're used carefully. to_char() is obviously the common example, but it's perfectly safe if you set the timezone or other locale settings or if your format string doesn't actually depend on any settings. Similarly, a user function that is non-immutable only due to a dependency on search_path *would* be safe to call from within an immutable function if that function does set search_path. The search_path would be inherited alleviating the problem. Even something like random() could be safely used in an immutable function as long as it doesn't actually change the output -- say if it just logs diagnostic messages as a result? Generally I think the idea is that the user *is* responsible for writing immutable functions carefully to hide any non-deterministic behaviour from the code they're calling. But that does raise the question of why to focus on search_path. I guess I'm just saying my goal isn't to *prove* the code is correct. The user is still responsible for asserting it's correct. I just want to detect cases where I can prove (or at least show it's likely that) it's *not* correct. -- greg
Re: Tightening behaviour for non-immutable behaviour in immutable functions
From
Peter Geoghegan
Date:
On Thu, Jun 9, 2022 at 12:39 PM Greg Stark <stark@mit.edu> wrote: > Generally I think the idea is that the user *is* responsible for > writing immutable functions carefully to hide any non-deterministic > behaviour from the code they're calling. But that does raise the > question of why to focus on search_path. > > I guess I'm just saying my goal isn't to *prove* the code is correct. > The user is still responsible for asserting it's correct. I just want > to detect cases where I can prove (or at least show it's likely that) > it's *not* correct. Right. It's virtually impossible to prove that, for many reasons, so the final responsibility must lie with the user-defined code. Presumably there is still significant value in detecting cases where some user-defined code provably does the wrong thing. Especially by targeting mistakes that experience has shown are relatively common. That's what the search_path case seems like to me. If somebody else wants to write another patch that adds on that, great. If not, then having this much still seems useful. -- Peter Geoghegan
On Thu, 9 Jun 2022 at 16:12, Peter Geoghegan <pg@bowt.ie> wrote: > > Presumably there is still significant value in detecting cases where > some user-defined code provably does the wrong thing. Especially by > targeting mistakes that experience has shown are relatively common. > That's what the search_path case seems like to me. By "relatively common" I think we're talking "nigh universal". Afaics there are no warnings in the docs about worrying about search_path on IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit I wasn't aware myself of all the gotchas described there. For that matter.... the gotchas are a bit .... convoluted.... If you leave out pg_catalog from search_path that's fine but if you leave out pg_temp that's a security disaster. If you put pg_catalog in it better be first or else it might be ok or might be a security issue but when you put pg_temp in it better be last or else it's *definitely* a disaster. $user is in search_path by default and that's fine for SECURITY DEFINER functions but it's a disaster for IMMUTABLE functions... I kind of feel like perhaps all the implicit stuff is unnecessary baroque frills. We should just put pg_temp and pg_catalog into the default postgresql.conf search_path and assume users will keep them there. And I'm not sure why we put pg_temp *first* -- I mean it sort of seems superficially sensible but it doesn't seem like there's any real reason to name your temporary tables the same as your permanent ones so why not just always add it last? I've attached a very WIP patch that implements the checks I'm leaning towards making (as elogs currently). They cause a ton of regression failures so probably we need to think about how to reduce the pain for users upgrading... Perhaps we should automatically fix up the current search patch and attach it to functions where necessary for users instead of just whingeing at them....
Attachment
Re: Tightening behaviour for non-immutable behaviour in immutable functions
From
Peter Geoghegan
Date:
On Mon, Jun 13, 2022 at 1:51 PM Greg Stark <stark@mit.edu> wrote: > By "relatively common" I think we're talking "nigh universal". Afaics > there are no warnings in the docs about worrying about search_path on > IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit > I wasn't aware myself of all the gotchas described there. I didn't realize that it was that bad. Even if it's only 10% as bad as you say, it would still be very valuable to do something about it (ideally with an approach that is non-invasive). -- Peter Geoghegan
Re: Tightening behaviour for non-immutable behaviour in immutable functions
From
Michael Paquier
Date:
On Mon, Jun 13, 2022 at 06:41:17PM -0700, Peter Geoghegan wrote: > On Mon, Jun 13, 2022 at 1:51 PM Greg Stark <stark@mit.edu> wrote: >> By "relatively common" I think we're talking "nigh universal". Afaics >> there are no warnings in the docs about worrying about search_path on >> IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit >> I wasn't aware myself of all the gotchas described there. > > I didn't realize that it was that bad. Even if it's only 10% as bad as > you say, it would still be very valuable to do something about it > (ideally with an approach that is non-invasive). Having checks implemented so as users cannot bite themselves back is a good idea in the long term, but I have also seen cases where abusing of immutable functions was useful: - Enforce the push down of function expressions to remote server. - Regression tests. Just a few weeks ago I have played with an advisory lock within an index expression. Perhaps I never should have done what the first point was doing anyway, but having a way to disable any of that, be it just a developer option for the purpose of some regression tests, would be nice. Skimming quickly through the patch, any of the checks related to search_path would not apply to the fancy cases I saw, though. -- Michael
Attachment
On Mon, 13 Jun 2022 at 16:50, Greg Stark <stark@mit.edu> wrote: > > For that matter.... the gotchas are a bit .... convoluted.... > > Perhaps we should automatically fix up the current search patch and > attach it to functions where necessary for users instead of just > whingeing at them.... So I reviewed my own patch and.... it was completely broken... I fixed it to actually check the right variables. I also implemented the other idea above of actually fixing up search_path in proconfig for the user by default. I think this is going to be more practical. The problem with expecting the user to provide search_path is that they probably aren't today so the warnings would be firing for everything... Providing a fixed up search_path for users would give them a smoother upgrade process where we can give a warning only if the search_path is changed substantively which is much less likely. I'm still quite concerned about the performance impact of having search_path on so many functions. It causes a cache flush which could be pretty painful on a frequently called function such as one in an index expression during a data load.... The other issue is that having proconfig set does prevent these functions from being inlined which can be seen in the regression test as seen below. I'm not sure how big a problem this is for users. Inlining is important for many use cases I think. Maybe we can go ahead and inline things even if they have a proconfig if it matches the proconfig on the caller? Or maybe even check if we get the same objects from both search_paths? Of course this patch is still very WIP. Only one or the other function makes sense to keep. And I'm not opposed to having a GUC to enable/disable the enforcement or warnings. And the code itself needs to be cleaned up with parts of it moving to guc.c and/or namespace.c. Example of regression tests noticing that immutable functions with proconfig set become non-inlineable: diff -U3 /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out /home/stark/src/postgresql/src/test/regress/results/rangefuncs.out --- /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out 2022-01-17 12:01:54.958628564 -0500 +++ /home/stark/src/postgresql/src/test/regress/results/rangefuncs.out 2022-06-16 02:16:47.589703966 -0400 @@ -1924,14 +1924,14 @@ select * from array_to_set(array['one', 'two']) as t(f1 point,f2 text); ERROR: return type mismatch in function declared to return record DETAIL: Final statement returns integer instead of point at column 1. -CONTEXT: SQL function "array_to_set" during inlining +CONTEXT: SQL function "array_to_set" during startup explain (verbose, costs off) select * from array_to_set(array['one', 'two']) as t(f1 numeric(4,2),f2 text); - QUERY PLAN --------------------------------------------------------------- - Function Scan on pg_catalog.generate_subscripts i - Output: i.i, ('{one,two}'::text[])[i.i] - Function Call: generate_subscripts('{one,two}'::text[], 1) + QUERY PLAN +---------------------------------------------------- + Function Scan on public.array_to_set t + Output: f1, f2 + Function Call: array_to_set('{one,two}'::text[]) (3 rows) create temp table rngfunc(f1 int8, f2 int8); @@ -2064,11 +2064,12 @@ explain (verbose, costs off) select * from testrngfunc(); - QUERY PLAN --------------------------------------------------------- - Result - Output: 7.136178::numeric(35,6), 7.14::numeric(35,2) -(2 rows) + QUERY PLAN +------------------------------------- + Function Scan on public.testrngfunc + Output: f1, f2 + Function Call: testrngfunc() +(3 rows) select * from testrngfunc(); f1 | f2 -- greg
Attachment
On Thu, 16 Jun 2022 at 12:04, Greg Stark <stark@mit.edu> wrote: > > Providing a fixed up search_path for users would give them a smoother > upgrade process where we can give a warning only if the search_path is > changed substantively which is much less likely. > > I'm still quite concerned about the performance impact of having > search_path on so many functions. It causes a cache flush which could > be pretty painful on a frequently called function such as one in an > index expression during a data load.... So it seems I missed a small change in Postgres SQL function world, namely the SQL standard syntax and prosqlbody column from e717a9a18. This feature is huge. It's awesome! It basically provides the lexical scoping feature I was hoping to implement. Any sql immutable standard syntax sql function can be safely used in indexes or elsewhere regardless of your search_path as all the names are already resolved. I'm now thinking we should just provide a LEXICAL option on Postgres style functions to implement the same name path and store sqlbody for them as well. They would have to be bound by the same restrictions (notably no polymorphic parameters) but otherwise I think it should be straightforward. Functions defined this way would always be safe for pg_dump regardless of the search_path used to define them and would also protect users from accidentally corrupting indexes when users have different search_paths. This doesn't really address plpgsql functions of course, I doubt we can do the same thing. -- greg
Re: Tightening behaviour for non-immutable behaviour in immutable functions
From
Andres Freund
Date:
Hi, On 2022-06-16 12:04:12 -0400, Greg Stark wrote: > Of course this patch is still very WIP. Only one or the other function > makes sense to keep. And I'm not opposed to having a GUC to > enable/disable the enforcement or warnings. And the code itself needs > to be cleaned up with parts of it moving to guc.c and/or namespace.c. This currently obviously doesn't pass tests - are you planning to work on this further? As is I'm not really clear what the CF entry is for. Given the current state it doesn't look like it's actually looking for review? Greetings, Andres Freund