Thread: func() & select func()

func() & select func()

From
"Hiroshi Inoue"
Date:
Hi all,

I have seen discussions about iscachable attribute of
functions. Now I'm confused to see a solution in 6.5
(by Shigeru Matsumoto).
=# explain select * from pg_class where oid=1259;

Index Scan using pg_class_oid_index on pg_class  (cost=0.00..2.01rows=1 width=92) 

1) Using non-cachable function f() =# create function f(oid) returns oid as   '    select $1;   ' language 'sql'; =#
explainselect * from pg_class where oid=f(1259);
 
 Seq Scan on pg_class  (cost=0.00..3.17 rows=1 width=92) 

Seems reasonable.

2) Using select f() =# explain select * from pg_class where oid=(select f(1259)); 
 Index Scan using pg_class_oid_index on pg_class  (cost=0.00..2.01rows=1 width=92)   InitPlan    ->  Result
(cost=0.00..0.00rows=0 width=0) 
 

This is the result in my current environment.
Hmm,what's the difference between 1) and 2) ?

Regards.

Hiroshi Inoue


Re: func() & select func()

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> 1) Using non-cachable function f()
>   =# explain select * from pg_class where oid=f(1259);
>   Seq Scan on pg_class  (cost=0.00..3.17 rows=1 width=92) 

> 2) Using select f() 
>  =# explain select * from pg_class where oid=(select f(1259)); 
>   Index Scan using pg_class_oid_index on pg_class  (cost=0.00..2.01
>     rows=1 width=92)
>     InitPlan
-> Result  (cost=0.00..0.00 rows=0 width=0) 

The sub-select is reduced to an initplan --- ie, executed only once,
not once per row --- because it has no dependency on the outer select.

Currently we do not consider the presence of noncachable functions as
a reason that prevents reducing a subplan to an initplan.  I thought
about it but didn't like the performance penalty.  It seems to me that
it's debatable which is the correct semantics, anyway.  Arguably an
outer select *should* assume that a parameterless inner select yields
constant results --- if you don't assume that then it makes no sense
to do joins over the results of sub-SELECTs in FROM, which is a feature
required by full SQL...
        regards, tom lane


RE: func() & select func()

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > 1) Using non-cachable function f()
> >   =# explain select * from pg_class where oid=f(1259);
> >   Seq Scan on pg_class  (cost=0.00..3.17 rows=1 width=92) 
> 
> > 2) Using select f() 
> >  =# explain select * from pg_class where oid=(select f(1259)); 
> >   Index Scan using pg_class_oid_index on pg_class  (cost=0.00..2.01
> >     rows=1 width=92)
> >     InitPlan
> -> Result  (cost=0.00..0.00 rows=0 width=0) 
> 
> The sub-select is reduced to an initplan --- ie, executed only once,
> not once per row --- because it has no dependency on the outer select.
> 
> Currently we do not consider the presence of noncachable functions as
> a reason that prevents reducing a subplan to an initplan.  I thought
> about it but didn't like the performance penalty.  It seems to me that
> it's debatable which is the correct semantics, anyway.  Arguably an

After a little thought,it seems to me that the behavior of the subquery
is more reasonable than current evaluation of functions.

Under MVCC,SELECT returns the content of a database at the time
when the query started no matter how long time it takes to return
the resultset.
Shouldn't functions be evaluated once(per the combination of parameters)
at the time when a query started ?

Regards.

Hiroshi Inoue


Re: func() & select func()

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Currently we do not consider the presence of noncachable functions as
>> a reason that prevents reducing a subplan to an initplan.  I thought
>> about it but didn't like the performance penalty.  It seems to me that
>> it's debatable which is the correct semantics, anyway.  Arguably an

> Shouldn't functions be evaluated once(per the combination of parameters)
> at the time when a query started ?

I don't think I want to buy into guaranteeing that, either.  In the
first place, that makes it impossible to get a random sampling of your
data by methods likeSELECT * FROM foo WHERE random() < 0.01;
Now admittedly this is a little bit flaky (not least because you'd get
inconsistent results if you tried to use a cursor to scan the output
multiple times) but I think it's useful enough to not want to break it.
Especially not when I just recently put a few hours into making the
optimizer treat this case correctly ;-)

In the second place, to guarantee evaluate-once behavior with a
noncooperative function, you'd have to actually maintain a cache of
function parameter sets and result values and consult it before making
a function call.  That we certainly don't want to do.  So I think the
existing distinction between cachable and noncachable functions is
appropriate.  The question is whether the presence of noncachable
functions should propagate out to cause the whole containing SELECT
to be treated as noncachable.

I think you could probably argue that either way, and could invent
examples favoring either choice.  Without a compelling reason to change
the behavior, I'm inclined to leave it as is.
        regards, tom lane


RE: func() & select func()

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Currently we do not consider the presence of noncachable functions as
> >> a reason that prevents reducing a subplan to an initplan.  I thought
> >> about it but didn't like the performance penalty.  It seems to me that
> >> it's debatable which is the correct semantics, anyway.  Arguably an
>
> > Shouldn't functions be evaluated once(per the combination of parameters)
> > at the time when a query started ?
>
> I don't think I want to buy into guaranteeing that, either.

I'm still confused and now suspicious if we could expect
unambiguous results for the queries which constain function
calls which cause strong side effect.
I think there are 2 ways.

1) Function calls with strong side effect should be inhibited   except the simple procedure call query "select func()".
 Seems Oracle has a similar restriction(I don't know details   sorry).
 

2) Users are responsible for calling functions without strong side   effect. Optimizer could freely change the order of
evaluation  and cache the funtion result.
 

> In the
> first place, that makes it impossible to get a random sampling of your
> data by methods like
>     SELECT * FROM foo WHERE random() < 0.01;

I don't understand what we should expect for the query.
Random sampling may be useful but it doesn't necessarily mean
proper.  Shouldn't we make random() an exception by adding
another attribute for it if we expect random sampling ?

BTW for the query   SELECT * FROM foo where random() < 0.01 and id < 100;

Is random() called for each row or for rows which satisfy id < 100 ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: func() & select func()

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I'm still confused and now suspicious if we could expect
> unambiguous results for the queries which constain function
> calls which cause strong side effect.

So far we have not talked about functions that actually have side
effects, just about how predictable the result of a side-effect-free
function is.  It would be a serious mistake to constrain our handling
of side-effect-free functions on the basis of what's needed to make
side-effects predictable.

At the moment I do not care at all about how predictable side-effects
are --- I think that that's up to the user to deal with.  We have seen
few if any complaints about misoptimization of nextval(), even though
it's theoretically been possible to have a problem with it for a long
time.  For example, inSELECT (column > 1) OR (nextval('seq') > 100) FROM ...
I believe it's been true forever that nextval won't be evaluated at
every column, but how many people complain?  Saying that the behavior
is implementation-defined seems fine to me.

> Random sampling may be useful but it doesn't necessarily mean
> proper.  Shouldn't we make random() an exception by adding
> another attribute for it if we expect random sampling ?

Maybe.  Right now we don't distinguish random() from other functions
that are considerably more predictable, like now().  Perhaps it'd be
worthwhile to recognize more levels of function predictability.
now() could be classified as "fixed result during one transaction",
since I believe it gives back the time of the start of the current
xact.  But I'm not sure it's worth worrying about just for now().  The
hard part would be figuring out a reasonable way to describe functions
that consult database tables --- those are fixed within a transaction
only if the tables they read don't change, but is it worth trying to
describe that?  If so, how?

> BTW for the query
>     SELECT * FROM foo where random() < 0.01 and id < 100;

> Is random() called for each row or for rows which satisfy id < 100 ?

Good question.  I think it'd be a mistake to specify a single answer for
that.  For this particular application, the user wouldn't care anyway.
        regards, tom lane


RE: func() & select func()

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > I'm still confused and now suspicious if we could expect
> > unambiguous results for the queries which constain function
> > calls which cause strong side effect.
> 
> So far we have not talked about functions that actually have side
> effects, just about how predictable the result of a side-effect-free
> function is.  It would be a serious mistake to constrain our handling
> of side-effect-free functions on the basis of what's needed to make
> side-effects predictable.
> 
> At the moment I do not care at all about how predictable side-effects
> are --- I think that that's up to the user to deal with. We have seen
> few if any complaints about misoptimization of nextval(), even though
> it's theoretically been possible to have a problem with it for a long
> time.  For example, in
>     SELECT (column > 1) OR (nextval('seq') > 100) FROM ...
> I believe it's been true forever that nextval won't be evaluated at
> every column, but how many people complain?  Saying that the behavior
> is implementation-defined seems fine to me.
>

Agreed.
It seems too painful for optimizer to care about side-effects.
[snip]

> xact.  But I'm not sure it's worth worrying about just for now().  The
> hard part would be figuring out a reasonable way to describe functions
> that consult database tables --- those are fixed within a transaction
> only if the tables they read don't change, but is it worth trying to
> describe that?  If so, how?
>

As to database lookup functions,we could expect fixed results for
one query. MVCC mechanism guarantees it and it's never a trivial
fact. However strictly speaking functions in a query may see the
change by the query itself.  The change could be caused by functions
which insert/update/delete.

Regards.

Hiroshi Inoue