Thread: IMMUTABLE?

IMMUTABLE?

From
David Wheeler
Date:
Performance Folks,

I just had an article[1] published in which I demonstrated recursive
PL/pgSQL functions with this function:

CREATE OR REPLACE FUNCTION fib (
     fib_for int
) RETURNS integer AS $$
BEGIN
     IF fib_for < 2 THEN
         RETURN fib_for;
     END IF;
     RETURN fib(fib_for - 2) + fib(fib_for - 1);
END;
$$ LANGUAGE plpgsql;

Naturally, it's slow:

try=# \timing
try=# select fib(28);
   fib
--------
317811
(1 row)

Time: 10642.803 ms

Now, I mistakenly said in my article that PostgreSQL doesn't have
native memoization, and so demonstrated how to use a table for
caching to speed up the function. It's pretty fast:

try=# select fib_cached(28);
fib_cached
------------
      317811
(1 row)

Time: 193.316 ms

But over the weekend, I was looking at the Pg docs and saw IMMUTABLE,
and said, "Oh yeah!". So I recreated the function with IMMUTABLE. But
the performance was not much better:

try=# select fib(28);
   fib
--------
317811
(1 row)

Time: 8505.668 ms
try=# select fib_cached(28);
fib_cached
------------
      317811
(1 row)

So, what gives? Am I missing something, or not understanding how
IMMUTABLE works?

Many TIA,

David

1. http://www.onlamp.com/pub/a/onlamp/2006/05/11/postgresql-plpgsql.html

Re: IMMUTABLE?

From
Tom Lane
Date:
David Wheeler <david@kineticode.com> writes:
> So, what gives? Am I missing something, or not understanding how
> IMMUTABLE works?

The latter.

            regards, tom lane

Re: IMMUTABLE?

From
David Wheeler
Date:
On May 15, 2006, at 20:21, Tom Lane wrote:

>> So, what gives? Am I missing something, or not understanding how
>> IMMUTABLE works?
>
> The latter.

Hee-hee! And after all those nice things I wrote about you in a
previous email on this list!

But seriously, the documentation says (as if I need to tell you, but
I was reading it again to make sure that I'm not insane):

> IMMUTABLE indicates that the function always returns the same
> result when given the same argument values; that is, it does not do
> database lookups or otherwise use information not directly present
> in its argument list. If this option is given, any call of the
> function with all-constant arguments can be immediately replaced
> with the function value.

So that seems pretty clear to me. Now, granted, the recursive calls
to fib() don't pass a constant argument, but I still would think that
after the first time I called fib(28), that the next call to fib(28)
would be lightening fast, even if fib(27) wasn't.

So, uh, would you mind telling me what I'm missing? I'm happy to turn
that knowledge into a documentation patch to help future boneheads
like myself. :-)

Thanks,

David

Re: IMMUTABLE?

From
Tom Lane
Date:
David Wheeler <david@kineticode.com> writes:
> But seriously, the documentation says (as if I need to tell you, but
> I was reading it again to make sure that I'm not insane):

>> IMMUTABLE indicates that the function always returns the same
>> result when given the same argument values; that is, it does not do
>> database lookups or otherwise use information not directly present
>> in its argument list. If this option is given, any call of the
>> function with all-constant arguments can be immediately replaced
>> with the function value.

Sure.  As I read it, that's talking about a static transformation:
planner sees 2 + 2 (or if you prefer, int4pl(2,2)), planner runs the
function and replaces the expression with 4.  Nothing there about
memoization.

It's true that the system *could* memoize (or in our more usual
parlance, cache function values) given the assumptions embodied in
IMMUTABLE.  But we don't, and I don't see any statement in the docs
that promises that we do.  For 99% of the functions that the planner
deals with, memoization would be seriously counterproductive because
the function evaluation cost is comparable to if not less than the
lookup cost in a memo table.  (int4pl is a good case in point.)

            regards, tom lane

Re: IMMUTABLE?

From
Joachim Wieland
Date:
On Tue, May 16, 2006 at 12:31:41AM -0400, Tom Lane wrote:
> It's true that the system *could* memoize (or in our more usual
> parlance, cache function values) given the assumptions embodied in
> IMMUTABLE.  But we don't, and I don't see any statement in the docs
> that promises that we do.  For 99% of the functions that the planner
> deals with, memoization would be seriously counterproductive because
> the function evaluation cost is comparable to if not less than the
> lookup cost in a memo table.  (int4pl is a good case in point.)

This seems to change as soon as one takes into account user functions.

While most immutable functions really seem to be small and their execution
fast, stable functions often hide complex sql (sometimes combined with
if-then-else or other program flow logic).

So irrespective of caching to prevent evaluation across statements, within a
single statement, is there a strong reason why for example in
WHERE col = f(const) with f() declared as immutable or stable and without an
index on col, f() still gets called for every row? Or is this optimization
just not done yet?


Joachim


Re: IMMUTABLE?

From
Tom Lane
Date:
Joachim Wieland <joe@mcknight.de> writes:
> So irrespective of caching to prevent evaluation across statements, within a
> single statement, is there a strong reason why for example in
> WHERE col = f(const) with f() declared as immutable or stable and without an
> index on col, f() still gets called for every row? Or is this optimization
> just not done yet?

The above statement is not correct, at least not for immutable functions.

            regards, tom lane

Re: IMMUTABLE?

From
Joachim Wieland
Date:
On Tue, May 16, 2006 at 09:33:14AM -0400, Tom Lane wrote:
> Joachim Wieland <joe@mcknight.de> writes:
> > So irrespective of caching to prevent evaluation across statements, within a
> > single statement, is there a strong reason why for example in
> > WHERE col = f(const) with f() declared as immutable or stable and without an
> > index on col, f() still gets called for every row? Or is this optimization
> > just not done yet?

> The above statement is not correct, at least not for immutable functions.

So an immutable function gets evaluated once but a stable function still gets
called for every row? Wouldn't it make sense to call a stable function only
once as well?


Joachim

Re: IMMUTABLE?

From
David Wheeler
Date:
On May 15, 2006, at 21:31, Tom Lane wrote:

> Sure.  As I read it, that's talking about a static transformation:
> planner sees 2 + 2 (or if you prefer, int4pl(2,2)), planner runs the
> function and replaces the expression with 4.  Nothing there about
> memoization.

Oh, I see. So it's more like a constant or C macro.

> It's true that the system *could* memoize (or in our more usual
> parlance, cache function values) given the assumptions embodied in
> IMMUTABLE.  But we don't, and I don't see any statement in the docs
> that promises that we do.  For 99% of the functions that the planner
> deals with, memoization would be seriously counterproductive because
> the function evaluation cost is comparable to if not less than the
> lookup cost in a memo table.  (int4pl is a good case in point.)

Yes, but there are definitely programming cases where memoization/
caching definitely helps. And it's easy to tell for a given function
whether or not it really helps by simply trying it with CACHED and
without.

Would this be a simple thing to implement?

Best,

David

Re: IMMUTABLE?

From
Christopher Kings-Lynne
Date:
> Yes, but there are definitely programming cases where
> memoization/caching definitely helps. And it's easy to tell for a given
> function whether or not it really helps by simply trying it with CACHED
> and without.
>
> Would this be a simple thing to implement?

It's called a "table" :)


Re: IMMUTABLE?

From
David Wheeler
Date:
On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote:

>> Yes, but there are definitely programming cases where memoization/
>> caching definitely helps. And it's easy to tell for a given
>> function whether or not it really helps by simply trying it with
>> CACHED and without.
>> Would this be a simple thing to implement?
>
> It's called a "table" :)

   http://www.justatheory.com/computers/databases/postgresql/
higher_order_plpgsql.html

Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course.

Best,

David

Re: IMMUTABLE?

From
"Jim C. Nasby"
Date:
On Tue, May 16, 2006 at 07:08:51PM -0700, David Wheeler wrote:
> On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote:
>
> >>Yes, but there are definitely programming cases where memoization/
> >>caching definitely helps. And it's easy to tell for a given
> >>function whether or not it really helps by simply trying it with
> >>CACHED and without.
> >>Would this be a simple thing to implement?
> >
> >It's called a "table" :)
>
>   http://www.justatheory.com/computers/databases/postgresql/
> higher_order_plpgsql.html
>
> Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course.

Rather than worrying about a generic form of memoization, what would be
extremely valuable would be to improve detection of the same function
being used multiple times in a query, ie:

SELECT moo(x), moo(x)/2 FROM table;

AFAIK PostgreSQL will currently execute moo(x) twice. Even if it knows
how to optimize this brain-dead example, I think there are other
examples it can't optimize right now. Having a much simpler memoization
scheme that only works on a tuple-by-tuple basis would probably
eliminate a lot of those (It wouldn't work for any executor node that
has to read it's entire input before returning anything, though, such as
sort).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461