Thread: func() & select func()
Hi all, I have seen discussions about iscachable attribute of functions. Now I'm confused to see a solution in 6.5 (by Shigeru Matsumoto). =# explain select * from pg_class where oid=1259; Index Scan using pg_class_oid_index on pg_class (cost=0.00..2.01rows=1 width=92) 1) Using non-cachable function f() =# create function f(oid) returns oid as ' select $1; ' language 'sql'; =# explainselect * from pg_class where oid=f(1259); Seq Scan on pg_class (cost=0.00..3.17 rows=1 width=92) Seems reasonable. 2) Using select f() =# explain select * from pg_class where oid=(select f(1259)); Index Scan using pg_class_oid_index on pg_class (cost=0.00..2.01rows=1 width=92) InitPlan -> Result (cost=0.00..0.00rows=0 width=0) This is the result in my current environment. Hmm,what's the difference between 1) and 2) ? Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > 1) Using non-cachable function f() > =# explain select * from pg_class where oid=f(1259); > Seq Scan on pg_class (cost=0.00..3.17 rows=1 width=92) > 2) Using select f() > =# explain select * from pg_class where oid=(select f(1259)); > Index Scan using pg_class_oid_index on pg_class (cost=0.00..2.01 > rows=1 width=92) > InitPlan -> Result (cost=0.00..0.00 rows=0 width=0) The sub-select is reduced to an initplan --- ie, executed only once, not once per row --- because it has no dependency on the outer select. Currently we do not consider the presence of noncachable functions as a reason that prevents reducing a subplan to an initplan. I thought about it but didn't like the performance penalty. It seems to me that it's debatable which is the correct semantics, anyway. Arguably an outer select *should* assume that a parameterless inner select yields constant results --- if you don't assume that then it makes no sense to do joins over the results of sub-SELECTs in FROM, which is a feature required by full SQL... regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > 1) Using non-cachable function f() > > =# explain select * from pg_class where oid=f(1259); > > Seq Scan on pg_class (cost=0.00..3.17 rows=1 width=92) > > > 2) Using select f() > > =# explain select * from pg_class where oid=(select f(1259)); > > Index Scan using pg_class_oid_index on pg_class (cost=0.00..2.01 > > rows=1 width=92) > > InitPlan > -> Result (cost=0.00..0.00 rows=0 width=0) > > The sub-select is reduced to an initplan --- ie, executed only once, > not once per row --- because it has no dependency on the outer select. > > Currently we do not consider the presence of noncachable functions as > a reason that prevents reducing a subplan to an initplan. I thought > about it but didn't like the performance penalty. It seems to me that > it's debatable which is the correct semantics, anyway. Arguably an After a little thought,it seems to me that the behavior of the subquery is more reasonable than current evaluation of functions. Under MVCC,SELECT returns the content of a database at the time when the query started no matter how long time it takes to return the resultset. Shouldn't functions be evaluated once(per the combination of parameters) at the time when a query started ? Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Currently we do not consider the presence of noncachable functions as >> a reason that prevents reducing a subplan to an initplan. I thought >> about it but didn't like the performance penalty. It seems to me that >> it's debatable which is the correct semantics, anyway. Arguably an > Shouldn't functions be evaluated once(per the combination of parameters) > at the time when a query started ? I don't think I want to buy into guaranteeing that, either. In the first place, that makes it impossible to get a random sampling of your data by methods likeSELECT * FROM foo WHERE random() < 0.01; Now admittedly this is a little bit flaky (not least because you'd get inconsistent results if you tried to use a cursor to scan the output multiple times) but I think it's useful enough to not want to break it. Especially not when I just recently put a few hours into making the optimizer treat this case correctly ;-) In the second place, to guarantee evaluate-once behavior with a noncooperative function, you'd have to actually maintain a cache of function parameter sets and result values and consult it before making a function call. That we certainly don't want to do. So I think the existing distinction between cachable and noncachable functions is appropriate. The question is whether the presence of noncachable functions should propagate out to cause the whole containing SELECT to be treated as noncachable. I think you could probably argue that either way, and could invent examples favoring either choice. Without a compelling reason to change the behavior, I'm inclined to leave it as is. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> Currently we do not consider the presence of noncachable functions as > >> a reason that prevents reducing a subplan to an initplan. I thought > >> about it but didn't like the performance penalty. It seems to me that > >> it's debatable which is the correct semantics, anyway. Arguably an > > > Shouldn't functions be evaluated once(per the combination of parameters) > > at the time when a query started ? > > I don't think I want to buy into guaranteeing that, either. I'm still confused and now suspicious if we could expect unambiguous results for the queries which constain function calls which cause strong side effect. I think there are 2 ways. 1) Function calls with strong side effect should be inhibited except the simple procedure call query "select func()". Seems Oracle has a similar restriction(I don't know details sorry). 2) Users are responsible for calling functions without strong side effect. Optimizer could freely change the order of evaluation and cache the funtion result. > In the > first place, that makes it impossible to get a random sampling of your > data by methods like > SELECT * FROM foo WHERE random() < 0.01; I don't understand what we should expect for the query. Random sampling may be useful but it doesn't necessarily mean proper. Shouldn't we make random() an exception by adding another attribute for it if we expect random sampling ? BTW for the query SELECT * FROM foo where random() < 0.01 and id < 100; Is random() called for each row or for rows which satisfy id < 100 ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > I'm still confused and now suspicious if we could expect > unambiguous results for the queries which constain function > calls which cause strong side effect. So far we have not talked about functions that actually have side effects, just about how predictable the result of a side-effect-free function is. It would be a serious mistake to constrain our handling of side-effect-free functions on the basis of what's needed to make side-effects predictable. At the moment I do not care at all about how predictable side-effects are --- I think that that's up to the user to deal with. We have seen few if any complaints about misoptimization of nextval(), even though it's theoretically been possible to have a problem with it for a long time. For example, inSELECT (column > 1) OR (nextval('seq') > 100) FROM ... I believe it's been true forever that nextval won't be evaluated at every column, but how many people complain? Saying that the behavior is implementation-defined seems fine to me. > Random sampling may be useful but it doesn't necessarily mean > proper. Shouldn't we make random() an exception by adding > another attribute for it if we expect random sampling ? Maybe. Right now we don't distinguish random() from other functions that are considerably more predictable, like now(). Perhaps it'd be worthwhile to recognize more levels of function predictability. now() could be classified as "fixed result during one transaction", since I believe it gives back the time of the start of the current xact. But I'm not sure it's worth worrying about just for now(). The hard part would be figuring out a reasonable way to describe functions that consult database tables --- those are fixed within a transaction only if the tables they read don't change, but is it worth trying to describe that? If so, how? > BTW for the query > SELECT * FROM foo where random() < 0.01 and id < 100; > Is random() called for each row or for rows which satisfy id < 100 ? Good question. I think it'd be a mistake to specify a single answer for that. For this particular application, the user wouldn't care anyway. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I'm still confused and now suspicious if we could expect > > unambiguous results for the queries which constain function > > calls which cause strong side effect. > > So far we have not talked about functions that actually have side > effects, just about how predictable the result of a side-effect-free > function is. It would be a serious mistake to constrain our handling > of side-effect-free functions on the basis of what's needed to make > side-effects predictable. > > At the moment I do not care at all about how predictable side-effects > are --- I think that that's up to the user to deal with. We have seen > few if any complaints about misoptimization of nextval(), even though > it's theoretically been possible to have a problem with it for a long > time. For example, in > SELECT (column > 1) OR (nextval('seq') > 100) FROM ... > I believe it's been true forever that nextval won't be evaluated at > every column, but how many people complain? Saying that the behavior > is implementation-defined seems fine to me. > Agreed. It seems too painful for optimizer to care about side-effects. [snip] > xact. But I'm not sure it's worth worrying about just for now(). The > hard part would be figuring out a reasonable way to describe functions > that consult database tables --- those are fixed within a transaction > only if the tables they read don't change, but is it worth trying to > describe that? If so, how? > As to database lookup functions,we could expect fixed results for one query. MVCC mechanism guarantees it and it's never a trivial fact. However strictly speaking functions in a query may see the change by the query itself. The change could be caused by functions which insert/update/delete. Regards. Hiroshi Inoue