Thread: wrong search_path being used
I didn't want to report the bug using the form before confirming it here. Here is a gist of what I'm trying: https://gist.github.com/49fcc8c4a5a810f66833#file-cleanup-sql The relevant part being this: old_path := pg_catalog.current_setting('search_path'); raise notice 'setting search_path from % to %', old_path, templ; perform pg_catalog.set_config('search_path', templ, true); ... select count(distinct transaction_id) from public.transaction_condition into temp_count; raise notice '% remaining transactions in public!', temp_count; select count(distinct transaction_id) from transaction_condition into temp_count; raise notice '% remaining transactions', temp_count; For which I get this result (NOTA is Portuguese word for NOTE or NOTICE, not sure...): NOTA: setting search_path from "$user",public to public NOTA: 1030 remaining transactions in public! NOTA: 66 remaining transactions Why do I get different results for both count() queries? The only difference between them is that I made the schema explicit in the first call but since current_path is set to "public" there shouldn't be any difference, right? By the way, 66 is the record count for stock.transaction_condition after calling that function the first time for the stock schema. I've tested the above using PG 9.2.2. Any ideas on what is happening? Thanks in advance, Rodrigo.
Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com> writes: > perform pg_catalog.set_config('search_path', templ, true); > ... > select count(distinct transaction_id) from public.transaction_condition > into temp_count; > raise notice '% remaining transactions in public!', temp_count; > select count(distinct transaction_id) from transaction_condition into > temp_count; If this is inside a plpgsql function that's been executed more than once, the SELECTs would have plans that were cached the first time around, so that what would matter is the search_path that prevailed during the first execution. There have been discussions about changing that but we wouldn't treat it as a back-patchable bug fix, because it would almost certainly break things for somebody. regards, tom lane
Em 07-01-2013 17:17, Tom Lane escreveu: > Rodrigo Rosenfeld Rosas<rr.rosas@gmail.com> writes: >> perform pg_catalog.set_config('search_path', templ, true); >> ... >> select count(distinct transaction_id) from public.transaction_condition >> into temp_count; >> raise notice '% remaining transactions in public!', temp_count; >> select count(distinct transaction_id) from transaction_condition into >> temp_count; > If this is inside a plpgsql function that's been executed more than > once, the SELECTs would have plans that were cached the first time > around, so that what would matter is the search_path that prevailed > during the first execution. There have been discussions about changing > that but we wouldn't treat it as a back-patchable bug fix, because > it would almost certainly break things for somebody. So, Tom, if I understand it correctly, you do consider it a bug but you don't want to backport a fix because it might break existent behavior in some dbs, right? But it is not clear to me if you're willing to fix it for 9.2.3 for instance? It is likely I'll be creating lots of functions to perform the same operations in different schemas as my application is being designed to handle multiple field templates using separate schemas and the application will switch what templates are used based on current_path but lots of database migrations (not sure if you know what I mean for that, but I'm using Rails web framework terminology here) will perform the same changes to the database for each of the supported schemas/templates. So, I'd really like this behavior to be fixed in future versions of PG if possible. What is your opinion on the subject? Thanks for the prompt answer! Cheers, Rodrigo.
Rodrigo Rosenfeld Rosas wrote: > Tom Lane wrote: >> There have been discussions about changing that > if I understand it correctly, you do consider it a bug but you > don't want to backport a fix because it might break existent > behavior in some dbs, right? No, there has been discussion about whether different behavior would be better in future major releases, but no consensus has been reached. >> but we wouldn't treat it as a back-patchable bug fix, because >> it would almost certainly break things for somebody. > > But it is not clear to me if you're willing to fix it for 9.2.3 > for instance? Back-patching means changing things in a minor release, where things only change after the second dot. We don't make changes in user-visible behavior like this in minor releases; so no, we would not make a change like this in 9.2.3 or any other 9.2 version. -Kevin
Em 09-01-2013 20:09, Kevin Grittner escreveu: > Rodrigo Rosenfeld Rosas wrote: >> Tom Lane wrote: >>> There have been discussions about changing that >> if I understand it correctly, you do consider it a bug but you >> don't want to backport a fix because it might break existent >> behavior in some dbs, right? > No, there has been discussion about whether different behavior > would be better in future major releases, but no consensus has been > reached. > >>> but we wouldn't treat it as a back-patchable bug fix, because >>> it would almost certainly break things for somebody. >> But it is not clear to me if you're willing to fix it for 9.2.3 >> for instance? > Back-patching means changing things in a minor release, where > things only change after the second dot. We don't make changes in > user-visible behavior like this in minor releases; so no, we would > not make a change like this in 9.2.3 or any other 9.2 version. > Ok, thanks for the explanation, Kevin. I'm curious though. Why wouldn't this behavior be considered a bug? Is there any link to previous discussions about this subject that I could read?
Rodrigo Rosenfeld Rosas wrote: > I'm curious though. Why wouldn't this behavior be considered a > bug? Is there any link to previous discussions about this subject > that I could read? A plpgsql function generates a plan on initial execution which chooses which particular tables are used. On subsequent executions that saves the overhead of parsing out the statement and looking up the table and its details, which can help performance quite a bit. If you want to have source code evaluated each time the function is called, you should use the EXECUTE statement within your function body, which will force parsing and path resolution of the statement for each execution. That should give you the behavior you want, with some cost in performance. To try to get your function code to work as you expect, the language would essentially need to identify which statements could be pre-planned, and which would needed to be treated as raw source on each execution. That would be tricky to implement, and would itself have some run-time cost. At this point we've put the burden on the programmer to identify this at the time the code is written, rather than adding run-time expense. -Kevin
"Kevin Grittner" <kgrittn@mail.com> writes: > To try to get your function code to work as you expect, the > language would essentially need to identify which statements could > be pre-planned, and which would needed to be treated as raw source > on each execution. That would be tricky to implement, and would > itself have some run-time cost. At this point we've put the burden > on the programmer to identify this at the time the code is written, > rather than adding run-time expense. I think that the alternative most likely to succeed is to consider any change in the active value of search_path as forcing replanning of cached plans. This wouldn't be that hard to implement but there's a definite risk of loss of performance due to unnecessary replanning (since the path change might or might not affect the particular query). It's also not unlikely that it could break applications that work today, because they depend -- perhaps without being aware of it -- on the existing first-path-wins behavior. Having said that, it seems likely that more people would prefer that behavior than the existing one. But it hasn't been clear enough to justify making such a subtly incompatible change. regards, tom lane
On 2013-01-12 14:29:38 -0500, Tom Lane wrote: > "Kevin Grittner" <kgrittn@mail.com> writes: > > To try to get your function code to work as you expect, the > > language would essentially need to identify which statements could > > be pre-planned, and which would needed to be treated as raw source > > on each execution. That would be tricky to implement, and would > > itself have some run-time cost. At this point we've put the burden > > on the programmer to identify this at the time the code is written, > > rather than adding run-time expense. > > I think that the alternative most likely to succeed is to consider any > change in the active value of search_path as forcing replanning of > cached plans. This wouldn't be that hard to implement but there's > a definite risk of loss of performance due to unnecessary replanning > (since the path change might or might not affect the particular query). > It's also not unlikely that it could break applications that work today, > because they depend -- perhaps without being aware of it -- on the > existing first-path-wins behavior. > > Having said that, it seems likely that more people would prefer that > behavior than the existing one. But it hasn't been clear enough to > justify making such a subtly incompatible change. Its a somewhat common practise to use SET in functions or as a configuration parameter to functions. I think at least the latter should still work without forcing to replan any query. Given that we advice setting the search path for SECURITY DEFINER... I guess it wouldn't really be feasible to keep the search path used to plan a query in its cached form and check that it fits the one currently used on every use of the cached plan? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2013-01-12 14:29:38 -0500, Tom Lane wrote: >> "Kevin Grittner" <kgrittn@mail.com> writes: >>> To try to get your function code to work as you expect, the >>> language would essentially need to identify which statements could >>> be pre-planned, and which would needed to be treated as raw source >>> on each execution. That would be tricky to implement, and would >>> itself have some run-time cost. At this point we've put the burden >>> on the programmer to identify this at the time the code is written, >>> rather than adding run-time expense. >> >> I think that the alternative most likely to succeed is to consider any >> change in the active value of search_path as forcing replanning of >> cached plans. This wouldn't be that hard to implement but there's >> a definite risk of loss of performance due to unnecessary replanning >> (since the path change might or might not affect the particular query). >> It's also not unlikely that it could break applications that work today, >> because they depend -- perhaps without being aware of it -- on the >> existing first-path-wins behavior. >> >> Having said that, it seems likely that more people would prefer that >> behavior than the existing one. But it hasn't been clear enough to >> justify making such a subtly incompatible change. > > Its a somewhat common practise to use SET in functions or as a > configuration parameter to functions. I think at least the latter should > still work without forcing to replan any query. Given that we advice > setting the search path for SECURITY DEFINER... > > I guess it wouldn't really be feasible to keep the search path used to > plan a query in its cached form and check that it fits the one currently > used on every use of the cached plan? The particular complaint here is about: Â perform pg_catalog.set_config('search_path', text_parameter, true); Â select ... from unqualified_tablename ...; The table used is based on the parameter passed on the first execution; the local search_path set on subsequent executions is ignored. Â Perhaps we should just not save a plan for any function which sets a new search_path while it is executing? -Kevin
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-12 14:29:38 -0500, Tom Lane wrote: >> I think that the alternative most likely to succeed is to consider any >> change in the active value of search_path as forcing replanning of >> cached plans. > I guess it wouldn't really be feasible to keep the search path used to > plan a query in its cached form and check that it fits the one currently > used on every use of the cached plan? Actually that's exactly what I meant: every time we arrive at a query with a cached plan, check to see if the active search_path value is the same as what it was when we made the cached plan, and replan if not. There's already infrastructure to save the search_path value for a plan, but what it's being used for right now is to restore the first-plan-time value of the path when a replan is forced for some other reason. It wouldn't be that hard to change it around to use it this way instead. regards, tom lane
Em 12-01-2013 18:13, Tom Lane escreveu: > Andres Freund<andres@2ndquadrant.com> writes: >> On 2013-01-12 14:29:38 -0500, Tom Lane wrote: >>> I think that the alternative most likely to succeed is to consider any >>> change in the active value of search_path as forcing replanning of >>> cached plans. >> I guess it wouldn't really be feasible to keep the search path used to >> plan a query in its cached form and check that it fits the one currently >> used on every use of the cached plan? > Actually that's exactly what I meant: every time we arrive at a query > with a cached plan, check to see if the active search_path value is the > same as what it was when we made the cached plan, and replan if not. > > There's already infrastructure to save the search_path value for a plan, > but what it's being used for right now is to restore the first-plan-time > value of the path when a replan is forced for some other reason. It > wouldn't be that hard to change it around to use it this way instead. > > regards, tom lane Something that would be really handy for applications using schemas for implementing multi-tenant support would be allowing usage of a function param in the SET section of the function. Something like this: CREATE FUNCTION some_function(p_schema, ...) RETURNS VOID AS $$ BEGIN ... END; $$ LANGUAGE plpgsql SET search_path = p_schema, public; Not sure what syntax to use since p_schema could be the name of some existent schema but you got the idea. This would avoid all the wrapper lines to save and restore the old search_path (specially when there are earlier returns in the function body).
On 2013-01-12 15:13:51 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-12 14:29:38 -0500, Tom Lane wrote: > >> I think that the alternative most likely to succeed is to consider any > >> change in the active value of search_path as forcing replanning of > >> cached plans. > > > I guess it wouldn't really be feasible to keep the search path used to > > plan a query in its cached form and check that it fits the one currently > > used on every use of the cached plan? > > Actually that's exactly what I meant: every time we arrive at a query > with a cached plan, check to see if the active search_path value is the > same as what it was when we made the cached plan, and replan if not. Okay. I was afraid it would add noticeable overhead to stuff like plpgsql... Maybe would lower that by having a "search_path" generation counter that gets increased everytime it gets changed but that seems a bit too complicated. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2013/1/13 Andres Freund <andres@2ndquadrant.com>: > On 2013-01-12 15:13:51 -0500, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >> > On 2013-01-12 14:29:38 -0500, Tom Lane wrote: >> >> I think that the alternative most likely to succeed is to consider any >> >> change in the active value of search_path as forcing replanning of >> >> cached plans. >> >> > I guess it wouldn't really be feasible to keep the search path used to >> > plan a query in its cached form and check that it fits the one currently >> > used on every use of the cached plan? >> >> Actually that's exactly what I meant: every time we arrive at a query >> with a cached plan, check to see if the active search_path value is the >> same as what it was when we made the cached plan, and replan if not. > > Okay. I was afraid it would add noticeable overhead to stuff like > plpgsql... Maybe would lower that by having a "search_path" generation > counter that gets increased everytime it gets changed but that seems a > bit too complicated. > we can use same mechanism, that is used for plpgsql polymorphic functions - different parameters => different instances of plpgsql function. we can store n instances per function - probably there can be memory issues when too much different search paths will be used - it is optimal probably for 10 - 100 different search paths different solution - can we specify search_path early in connection string? Then this information can be used by connection pooler - and we can do replan when different search_path is identified. Regards Pavel > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs
Just adding in my two cents... At my last company we had several hundred stored procedures which all set a search path dynamically based on an input argument prior to running any queries. The current behavior is very alarming and challenging to consistently workaround. Would be very happy to see a future version without the search path catching issues. Cheers, -- Casey Allen Shobe casey@shobe.info
Hi Tom, Why not implement a session/database option to use (or not) the replanning of such functions. This would allow existing PG clusters to behave as they historically did, and would simultaneously allow users who want to have benefit of dynamically changing "search_path" inside a function to obtain the expected result. IMO, the query plan cache is a huge mistake, if it couldn't be disabled (in another way than using EXECUTE). Perhaps based on the volatility of the function ? I intensively use schemas in a "heavy" PG database (understand business-oriented logic db) and I'm today blocked by not being able to select schemas in functions called several times in a session, leading to wrong results. If performance is a good thing, getting the expected result is primordial. Lastly, is it planned to resolve this bug quickly ? Thanks for your attention - and your involvement in PG project ;-) -- View this message in context: http://postgresql.1045698.n5.nabble.com/wrong-search-path-being-used-tp5739027p5752394.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.