Thread: STABLE functions

STABLE functions

From
Joachim Wieland
Date:
Hi there,

this is actually a repost from the BUGS-list where I got no answer.

I'm using 7.3.2 and encounter the problem that a SELECT query that uses
a function with a constant argument is quite slow. The function is
declared STABLE. Here's an example:

SELECT ... FROM table WHERE col = f('xyz');

From what I read in the docs

( http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-createfunction.html )

I thought that this function is only executed once when declared as
STABLE, however the time of the query seems to indicate that it is
executed for each row of "table".

Below is a small script that generates some SQL commands that should show
the problem. It creates a table, inserts some rows and defines a
function that should just take some time when executed.

In my opinion
SELECT s FROM test WHERE s = f_test('abc');

should (roughly) be as fast as
SELECT f_test('abc');

(for a relatively small table) but it isn't.

-----8<------------------

#!/bin/sh

echo "CREATE TABLE test (s int);"

for i in `seq 0 1000`; doecho "INSERT INTO test VALUES ($i);"
done

cat << EOF
CREATE OR REPLACE FUNCTION f_test(VARCHAR(200)) RETURNS int AS '
DECLAREr RECORD;v VARCHAR(200);a ALIAS FOR \$1;
BEGINIF a IS NULL THEN    RETURN NULL;END IF;FOR r IN SELECT * FROM test LOOP    v = r.s;END LOOP;RETURN 1;
END;
' LANGUAGE plpgsql STABLE STRICT;

EOF

-----8<------------------

These are the times I got:

=> explain analyze select f_test('abc');Total runtime: 49.52 msec

=> explain analyze select * from test where s = 1;Total runtime: 4.90 msec

=> explain analyze select * from test where s = f_test('abc');Total runtime: 65084.03 msec

I expected ~ 49.52 msec + 4.90 msec...

Can anybody tell me if this is my mistake or PostgreSQL's?


Thanks,
Joachim



Re: STABLE functions

From
Tom Lane
Date:
Joachim Wieland <jwieland@kawo2.rwth-aachen.de> writes:
> I'm using 7.3.2 and encounter the problem that a SELECT query that uses
> a function with a constant argument is quite slow. The function is
> declared STABLE. Here's an example:

> SELECT ... FROM table WHERE col = f('xyz');

>> From what I read in the docs

> ( http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-createfunction.html )

> I thought that this function is only executed once when declared as
> STABLE,

You are mistaken.

If you'd declared it IMMUTABLE, then it would get constant-folded, but
that may or may not be suitable for your purpose.
        regards, tom lane



Re: STABLE functions

From
Joachim Wieland
Date:
Hi Tom,

On Fri, Apr 25, 2003 at 10:01:44AM -0400, Tom Lane wrote:
> > I thought that this function is only executed once when declared as
> > STABLE,

> You are mistaken.

> If you'd declared it IMMUTABLE, then it would get constant-folded, but
> that may or may not be suitable for your purpose.

The function in question is not IMMUTABLE :-( It depends on database
lookups but won't change its results within one transaction.

Quotation from the docs:

| STABLE indicates that within a single table scan the function will
| consistently return the same result for the same argument values, but
| that ist result could change across SQL statements.

So why is a
 SELECT ... FROM table WHERE col = f(...)

(with a STABLE function f) not a "single table scan" or why does
PostgreSQL re-calculate the value of f() here for every row?


Thanks a lot,
Joachim

-- 
*****PGP key available - send e-mail request*****
Due to circumstances beyond your control, you are master of your fate
and captain of your soul.



Re: STABLE functions

From
Tom Lane
Date:
Joachim Wieland <jwieland@kawo2.rwth-aachen.de> writes:
> So why is a
>   SELECT ... FROM table WHERE col = f(...)
> (with a STABLE function f) not a "single table scan" 

The point is that the system *may* choose to evaluate f() only once,
not that it *must* do so.  The classification exists to make it valid
to use an indexscan on "col".
        regards, tom lane



Re: STABLE functions

From
Philip Warner
Date:
At 11:13 AM 25/04/2003 -0400, Tom Lane wrote:
>The point is that the system *may* choose to evaluate f() only once,
>not that it *must* do so.  The classification exists to make it valid
>to use an indexscan on "col".

So just for my own understanding, the optimizer does not know that it can 
treat a STABLE function f as constant inside an outer loop of t1 in this query:
    select * from t1,t2 where t2.f1 = f(t1.f1)




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: STABLE functions

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> So just for my own understanding, the optimizer does not know that it can 
> treat a STABLE function f as constant inside an outer loop of t1 in
> this query:

>      select * from t1,t2 where t2.f1 = f(t1.f1)

Sure it does.  For example:

regression=# create table t1(f1 int);
CREATE TABLE
regression=# create table t2(f1 int);
CREATE TABLE
regression=# create index t2f1 on t2(f1);
CREATE INDEX
regression=# create function f(int) returns int as '
regression'# select $1 + 1' language sql stable;
CREATE FUNCTION
regression=# explain select * from t1,t2 where t2.f1 = f(t1.f1);                             QUERY PLAN
----------------------------------------------------------------------Nested Loop  (cost=0.00..17175.00 rows=5000
width=8) ->  Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=4)  ->  Index Scan using t2f1 on t2  (cost=0.00..17.08
rows=5width=4)        Index Cond: (t2.f1 = f("outer".f1))
 
(4 rows)

Transforming this query into an indexscan is valid only because f()
is stable or better.  With a non-stable function, you get a plain
nestloop:

regression=# create function f2(int) returns int as '
regression'# select $1 + 1' language sql;
CREATE FUNCTION
regression=# explain select * from t1,t2 where t2.f1 = f2(t1.f1);                        QUERY PLAN
------------------------------------------------------------Nested Loop  (cost=0.00..35020.00 rows=5000 width=8)  Join
Filter:("inner".f1 = f2("outer".f1))  ->  Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=4)  ->  Seq Scan on t2
(cost=0.00..20.00rows=1000 width=4)
 
(4 rows)

which is slow but semantically impeccable ;-)

The point at issue is that the "stable function" classification was
defined and implemented to provide a semantically valid way of deciding
whether it's safe to treat an expression as an indexscan qualifier.
There is no code that attempts to do anything else with it.
        regards, tom lane



Re: STABLE functions

From
Joachim Wieland
Date:
Hi,

On Sun, Apr 27, 2003 at 12:55:34AM -0400, Tom Lane wrote:
> The point at issue is that the "stable function" classification was
> defined and implemented to provide a semantically valid way of deciding
> whether it's safe to treat an expression as an indexscan qualifier.
> There is no code that attempts to do anything else with it.

Just out of curiosity: What would be an example where you can not treat
a stable function as a constant in a single sql query?


Thanks,
Joachim

-- 
*****PGP key available - send e-mail request*****   -   ICQ: 37225940
"If you want truly to understand something, try to change it"- Kurt Lewin (1890 - 1947)



Re: STABLE functions

From
"Zeugswetter Andreas SB SD"
Date:
> The point at issue is that the "stable function" classification was
> defined and implemented to provide a semantically valid way of deciding
> whether it's safe to treat an expression as an indexscan qualifier.
> There is no code that attempts to do anything else with it.

So the problem at hand seems to be, that the optimizer treats the function
as beeing too cheap (does it apply a cost at all ?). Since this function was in
Pl/SQL it should actually be doable to estimate a cost for the function from
statistics.
Is there a working way to tell the optimizer that the function is expensive,
and thus make it prefer only evaluating it once ?

Andreas