Thread: IMMUTABLE?
Performance Folks, I just had an article[1] published in which I demonstrated recursive PL/pgSQL functions with this function: CREATE OR REPLACE FUNCTION fib ( fib_for int ) RETURNS integer AS $$ BEGIN IF fib_for < 2 THEN RETURN fib_for; END IF; RETURN fib(fib_for - 2) + fib(fib_for - 1); END; $$ LANGUAGE plpgsql; Naturally, it's slow: try=# \timing try=# select fib(28); fib -------- 317811 (1 row) Time: 10642.803 ms Now, I mistakenly said in my article that PostgreSQL doesn't have native memoization, and so demonstrated how to use a table for caching to speed up the function. It's pretty fast: try=# select fib_cached(28); fib_cached ------------ 317811 (1 row) Time: 193.316 ms But over the weekend, I was looking at the Pg docs and saw IMMUTABLE, and said, "Oh yeah!". So I recreated the function with IMMUTABLE. But the performance was not much better: try=# select fib(28); fib -------- 317811 (1 row) Time: 8505.668 ms try=# select fib_cached(28); fib_cached ------------ 317811 (1 row) So, what gives? Am I missing something, or not understanding how IMMUTABLE works? Many TIA, David 1. http://www.onlamp.com/pub/a/onlamp/2006/05/11/postgresql-plpgsql.html
David Wheeler <david@kineticode.com> writes: > So, what gives? Am I missing something, or not understanding how > IMMUTABLE works? The latter. regards, tom lane
On May 15, 2006, at 20:21, Tom Lane wrote: >> So, what gives? Am I missing something, or not understanding how >> IMMUTABLE works? > > The latter. Hee-hee! And after all those nice things I wrote about you in a previous email on this list! But seriously, the documentation says (as if I need to tell you, but I was reading it again to make sure that I'm not insane): > IMMUTABLE indicates that the function always returns the same > result when given the same argument values; that is, it does not do > database lookups or otherwise use information not directly present > in its argument list. If this option is given, any call of the > function with all-constant arguments can be immediately replaced > with the function value. So that seems pretty clear to me. Now, granted, the recursive calls to fib() don't pass a constant argument, but I still would think that after the first time I called fib(28), that the next call to fib(28) would be lightening fast, even if fib(27) wasn't. So, uh, would you mind telling me what I'm missing? I'm happy to turn that knowledge into a documentation patch to help future boneheads like myself. :-) Thanks, David
David Wheeler <david@kineticode.com> writes: > But seriously, the documentation says (as if I need to tell you, but > I was reading it again to make sure that I'm not insane): >> IMMUTABLE indicates that the function always returns the same >> result when given the same argument values; that is, it does not do >> database lookups or otherwise use information not directly present >> in its argument list. If this option is given, any call of the >> function with all-constant arguments can be immediately replaced >> with the function value. Sure. As I read it, that's talking about a static transformation: planner sees 2 + 2 (or if you prefer, int4pl(2,2)), planner runs the function and replaces the expression with 4. Nothing there about memoization. It's true that the system *could* memoize (or in our more usual parlance, cache function values) given the assumptions embodied in IMMUTABLE. But we don't, and I don't see any statement in the docs that promises that we do. For 99% of the functions that the planner deals with, memoization would be seriously counterproductive because the function evaluation cost is comparable to if not less than the lookup cost in a memo table. (int4pl is a good case in point.) regards, tom lane
On Tue, May 16, 2006 at 12:31:41AM -0400, Tom Lane wrote: > It's true that the system *could* memoize (or in our more usual > parlance, cache function values) given the assumptions embodied in > IMMUTABLE. But we don't, and I don't see any statement in the docs > that promises that we do. For 99% of the functions that the planner > deals with, memoization would be seriously counterproductive because > the function evaluation cost is comparable to if not less than the > lookup cost in a memo table. (int4pl is a good case in point.) This seems to change as soon as one takes into account user functions. While most immutable functions really seem to be small and their execution fast, stable functions often hide complex sql (sometimes combined with if-then-else or other program flow logic). So irrespective of caching to prevent evaluation across statements, within a single statement, is there a strong reason why for example in WHERE col = f(const) with f() declared as immutable or stable and without an index on col, f() still gets called for every row? Or is this optimization just not done yet? Joachim
Joachim Wieland <joe@mcknight.de> writes: > So irrespective of caching to prevent evaluation across statements, within a > single statement, is there a strong reason why for example in > WHERE col = f(const) with f() declared as immutable or stable and without an > index on col, f() still gets called for every row? Or is this optimization > just not done yet? The above statement is not correct, at least not for immutable functions. regards, tom lane
On Tue, May 16, 2006 at 09:33:14AM -0400, Tom Lane wrote: > Joachim Wieland <joe@mcknight.de> writes: > > So irrespective of caching to prevent evaluation across statements, within a > > single statement, is there a strong reason why for example in > > WHERE col = f(const) with f() declared as immutable or stable and without an > > index on col, f() still gets called for every row? Or is this optimization > > just not done yet? > The above statement is not correct, at least not for immutable functions. So an immutable function gets evaluated once but a stable function still gets called for every row? Wouldn't it make sense to call a stable function only once as well? Joachim
On May 15, 2006, at 21:31, Tom Lane wrote: > Sure. As I read it, that's talking about a static transformation: > planner sees 2 + 2 (or if you prefer, int4pl(2,2)), planner runs the > function and replaces the expression with 4. Nothing there about > memoization. Oh, I see. So it's more like a constant or C macro. > It's true that the system *could* memoize (or in our more usual > parlance, cache function values) given the assumptions embodied in > IMMUTABLE. But we don't, and I don't see any statement in the docs > that promises that we do. For 99% of the functions that the planner > deals with, memoization would be seriously counterproductive because > the function evaluation cost is comparable to if not less than the > lookup cost in a memo table. (int4pl is a good case in point.) Yes, but there are definitely programming cases where memoization/ caching definitely helps. And it's easy to tell for a given function whether or not it really helps by simply trying it with CACHED and without. Would this be a simple thing to implement? Best, David
> Yes, but there are definitely programming cases where > memoization/caching definitely helps. And it's easy to tell for a given > function whether or not it really helps by simply trying it with CACHED > and without. > > Would this be a simple thing to implement? It's called a "table" :)
On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote: >> Yes, but there are definitely programming cases where memoization/ >> caching definitely helps. And it's easy to tell for a given >> function whether or not it really helps by simply trying it with >> CACHED and without. >> Would this be a simple thing to implement? > > It's called a "table" :) http://www.justatheory.com/computers/databases/postgresql/ higher_order_plpgsql.html Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course. Best, David
On Tue, May 16, 2006 at 07:08:51PM -0700, David Wheeler wrote: > On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote: > > >>Yes, but there are definitely programming cases where memoization/ > >>caching definitely helps. And it's easy to tell for a given > >>function whether or not it really helps by simply trying it with > >>CACHED and without. > >>Would this be a simple thing to implement? > > > >It's called a "table" :) > > http://www.justatheory.com/computers/databases/postgresql/ > higher_order_plpgsql.html > > Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course. Rather than worrying about a generic form of memoization, what would be extremely valuable would be to improve detection of the same function being used multiple times in a query, ie: SELECT moo(x), moo(x)/2 FROM table; AFAIK PostgreSQL will currently execute moo(x) twice. Even if it knows how to optimize this brain-dead example, I think there are other examples it can't optimize right now. Having a much simpler memoization scheme that only works on a tuple-by-tuple basis would probably eliminate a lot of those (It wouldn't work for any executor node that has to read it's entire input before returning anything, though, such as sort). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461