Thread: BUG #2150: PL/Python function delcared STABLE gets run repeatedly
The following bug has been logged online: Bug reference: 2150 Logged by: Aaron Swartz Email address: me@aaronsw.com PostgreSQL version: 8.1.1 Operating system: Mac OS X Description: PL/Python function delcared STABLE gets run repeatedly Details: We have the PL/Python function get_foo(n) that returns an array. If this function is declared IMMUTABLE, then: SELECT * FROM tablenamehere WHERE id = any(get_foo(21)); returns quickly. If the function is declared STABLE, however, PostgreSQL seems to incorrectly run it for every row in the table and it takes forever.
"Aaron Swartz" <me@aaronsw.com> writes: > We have the PL/Python function get_foo(n) that returns an array. If this > function is declared IMMUTABLE, then: > SELECT * FROM tablenamehere WHERE id = any(get_foo(21)); > returns quickly. If the function is declared STABLE, however, PostgreSQL > seems to incorrectly run it for every row in the table and it takes forever. = ANY isn't indexable in 8.1, so there's no obvious reason why either version would be fast. You're going to have to provide more detail if you want help. For starters, what does EXPLAIN ANALYZE show in each case? What's the schema of the table (\d output will do)? regards, tom lane
Aaron Swartz <me@aaronsw.com> writes: > The function we're running returns an array, and it is non-trivial to > compute. We do a simple query with it like "SELECT * FROM > tablename WHERE id = any(foo(21))"). > When the function is STABLE (or VOLATILE) the function is run > to generate the array every time. If the function is IMMUTABLE, > the array is computed only once for this query, as we'd expect, > and the query is fast. Oh, you are misunderstanding the point of IMMUTABLE/STABLE. STABLE essentially gives the planner permission to use the function in an indexscan qualification. It does *not* cause any caching of the function result in other contexts, which is what you seem to be wishing would happen. IMMUTABLE/STABLE/VOLATILE are promises from you to the system about the behavior of the function, not promises from the system about how it will choose to evaluate the function. What I'd suggest is recasting the function to return a SETOF result instead of an array, and then writing SELECT * FROM tablename WHERE id IN (select * from foo(21)) This should get you a plan that will work reasonably well for you. regards, tom lane
On Fri, Jan 06, 2006 at 12:43:06PM -0500, Tom Lane wrote: > Oh, you are misunderstanding the point of IMMUTABLE/STABLE. > STABLE essentially gives the planner permission to use the function > in an indexscan qualification. It does *not* cause any caching of > the function result in other contexts, which is what you seem to be > wishing would happen. Is caching of results for STABLE/IMMUTABLE functions called with a constant something that would be reasonable to add? I certainly wish this happened natively, but I've always hacked around it by SELECT ... WHERE x = (SELECT foo(21)) so presumably some kind of query transform logic similar to what was done for min/max might work. I don't see anything about this on the TODO... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Is caching of results for STABLE/IMMUTABLE functions called with a > constant something that would be reasonable to add? I tend to think not: the distributed overhead would outweigh the advantages. See previous discussions in the archives. (However, the previous discussions have imagined an explicit cache that stores function names, arguments, and results. Your thought of transforming the call into an InitPlan node might be interesting. The trick is still to know which functions are expensive enough to justify the overhead of an InitPlan.) regards, tom lane
On Fri, Jan 06, 2006 at 02:48:59PM -0500, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > Is caching of results for STABLE/IMMUTABLE functions called with a > > constant something that would be reasonable to add? > > I tend to think not: the distributed overhead would outweigh the > advantages. See previous discussions in the archives. > > (However, the previous discussions have imagined an explicit cache > that stores function names, arguments, and results. Your thought > of transforming the call into an InitPlan node might be interesting. > The trick is still to know which functions are expensive enough to > justify the overhead of an InitPlan.) > Oh, yeah, actually caching results between queries doesn't make sense; I think that's pretty much a corner-case that people should be expected to code up/handle themselves. I'm just looking for only executing the function once per query. Yeah, some heuristics to try and tweak between adding a seperate planning node and just running the function a few times would be nice, but I suspect that's overkill. I think it would be fine to just assume that any time you're going to execute a function more than X number of times (where X should be easy to deduce during query planning), you'll be better off taking the cost to tweak the plan to only run it once. How does the overhead of an InitPlan compare to the overhead of calling a SQL function? Or plpgsql? Or one of the external function languages? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461