Thread: STABLE functions
Hi there, this is actually a repost from the BUGS-list where I got no answer. I'm using 7.3.2 and encounter the problem that a SELECT query that uses a function with a constant argument is quite slow. The function is declared STABLE. Here's an example: SELECT ... FROM table WHERE col = f('xyz'); From what I read in the docs ( http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-createfunction.html ) I thought that this function is only executed once when declared as STABLE, however the time of the query seems to indicate that it is executed for each row of "table". Below is a small script that generates some SQL commands that should show the problem. It creates a table, inserts some rows and defines a function that should just take some time when executed. In my opinion SELECT s FROM test WHERE s = f_test('abc'); should (roughly) be as fast as SELECT f_test('abc'); (for a relatively small table) but it isn't. -----8<------------------ #!/bin/sh echo "CREATE TABLE test (s int);" for i in `seq 0 1000`; doecho "INSERT INTO test VALUES ($i);" done cat << EOF CREATE OR REPLACE FUNCTION f_test(VARCHAR(200)) RETURNS int AS ' DECLAREr RECORD;v VARCHAR(200);a ALIAS FOR \$1; BEGINIF a IS NULL THEN RETURN NULL;END IF;FOR r IN SELECT * FROM test LOOP v = r.s;END LOOP;RETURN 1; END; ' LANGUAGE plpgsql STABLE STRICT; EOF -----8<------------------ These are the times I got: => explain analyze select f_test('abc');Total runtime: 49.52 msec => explain analyze select * from test where s = 1;Total runtime: 4.90 msec => explain analyze select * from test where s = f_test('abc');Total runtime: 65084.03 msec I expected ~ 49.52 msec + 4.90 msec... Can anybody tell me if this is my mistake or PostgreSQL's? Thanks, Joachim
Joachim Wieland <jwieland@kawo2.rwth-aachen.de> writes: > I'm using 7.3.2 and encounter the problem that a SELECT query that uses > a function with a constant argument is quite slow. The function is > declared STABLE. Here's an example: > SELECT ... FROM table WHERE col = f('xyz'); >> From what I read in the docs > ( http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-createfunction.html ) > I thought that this function is only executed once when declared as > STABLE, You are mistaken. If you'd declared it IMMUTABLE, then it would get constant-folded, but that may or may not be suitable for your purpose. regards, tom lane
Hi Tom, On Fri, Apr 25, 2003 at 10:01:44AM -0400, Tom Lane wrote: > > I thought that this function is only executed once when declared as > > STABLE, > You are mistaken. > If you'd declared it IMMUTABLE, then it would get constant-folded, but > that may or may not be suitable for your purpose. The function in question is not IMMUTABLE :-( It depends on database lookups but won't change its results within one transaction. Quotation from the docs: | STABLE indicates that within a single table scan the function will | consistently return the same result for the same argument values, but | that ist result could change across SQL statements. So why is a SELECT ... FROM table WHERE col = f(...) (with a STABLE function f) not a "single table scan" or why does PostgreSQL re-calculate the value of f() here for every row? Thanks a lot, Joachim -- *****PGP key available - send e-mail request***** Due to circumstances beyond your control, you are master of your fate and captain of your soul.
Joachim Wieland <jwieland@kawo2.rwth-aachen.de> writes: > So why is a > SELECT ... FROM table WHERE col = f(...) > (with a STABLE function f) not a "single table scan" The point is that the system *may* choose to evaluate f() only once, not that it *must* do so. The classification exists to make it valid to use an indexscan on "col". regards, tom lane
At 11:13 AM 25/04/2003 -0400, Tom Lane wrote: >The point is that the system *may* choose to evaluate f() only once, >not that it *must* do so. The classification exists to make it valid >to use an indexscan on "col". So just for my own understanding, the optimizer does not know that it can treat a STABLE function f as constant inside an outer loop of t1 in this query: select * from t1,t2 where t2.f1 = f(t1.f1) ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > So just for my own understanding, the optimizer does not know that it can > treat a STABLE function f as constant inside an outer loop of t1 in > this query: > select * from t1,t2 where t2.f1 = f(t1.f1) Sure it does. For example: regression=# create table t1(f1 int); CREATE TABLE regression=# create table t2(f1 int); CREATE TABLE regression=# create index t2f1 on t2(f1); CREATE INDEX regression=# create function f(int) returns int as ' regression'# select $1 + 1' language sql stable; CREATE FUNCTION regression=# explain select * from t1,t2 where t2.f1 = f(t1.f1); QUERY PLAN ----------------------------------------------------------------------Nested Loop (cost=0.00..17175.00 rows=5000 width=8) -> Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=4) -> Index Scan using t2f1 on t2 (cost=0.00..17.08 rows=5width=4) Index Cond: (t2.f1 = f("outer".f1)) (4 rows) Transforming this query into an indexscan is valid only because f() is stable or better. With a non-stable function, you get a plain nestloop: regression=# create function f2(int) returns int as ' regression'# select $1 + 1' language sql; CREATE FUNCTION regression=# explain select * from t1,t2 where t2.f1 = f2(t1.f1); QUERY PLAN ------------------------------------------------------------Nested Loop (cost=0.00..35020.00 rows=5000 width=8) Join Filter:("inner".f1 = f2("outer".f1)) -> Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=4) -> Seq Scan on t2 (cost=0.00..20.00rows=1000 width=4) (4 rows) which is slow but semantically impeccable ;-) The point at issue is that the "stable function" classification was defined and implemented to provide a semantically valid way of deciding whether it's safe to treat an expression as an indexscan qualifier. There is no code that attempts to do anything else with it. regards, tom lane
Hi, On Sun, Apr 27, 2003 at 12:55:34AM -0400, Tom Lane wrote: > The point at issue is that the "stable function" classification was > defined and implemented to provide a semantically valid way of deciding > whether it's safe to treat an expression as an indexscan qualifier. > There is no code that attempts to do anything else with it. Just out of curiosity: What would be an example where you can not treat a stable function as a constant in a single sql query? Thanks, Joachim -- *****PGP key available - send e-mail request***** - ICQ: 37225940 "If you want truly to understand something, try to change it"- Kurt Lewin (1890 - 1947)
> The point at issue is that the "stable function" classification was > defined and implemented to provide a semantically valid way of deciding > whether it's safe to treat an expression as an indexscan qualifier. > There is no code that attempts to do anything else with it. So the problem at hand seems to be, that the optimizer treats the function as beeing too cheap (does it apply a cost at all ?). Since this function was in Pl/SQL it should actually be doable to estimate a cost for the function from statistics. Is there a working way to tell the optimizer that the function is expensive, and thus make it prefer only evaluating it once ? Andreas