Thread: Three types of functions, ala function redux.

Three types of functions, ala function redux.

From
mlw
Date:
Given this basic SQL statement:

select * from table where col = function() ;

There are three basic types of SQL behaviors that should be able to be
performed.

(1) "function()" returns a single value. Postgres should be able to
understand how to optimize this to be: "select * from table where col =
value" where value is the datum returned by function.

(2) "function()" returns a number of values that are independent of the
query. Postgres should be able to optimize this to be: "select * from
table where col in (val1, val2, val3, ..valn)." I guess Postgres can
loop until done, using the isDone flag?

(3) "function()" returns a value based on the query. (This seems to be
how it currently functions.) where "select * from table where col =
function()" will end up doing a full table scan. 


(1) and (2) are related, and could probably be implemented using the
same code. 
(3) Seems to be how Postgres is currently optimized.

It seems like Tom Lane laid the foundation for this behavior in 7.1
newC. (Does it now work this way?)

Does anyone see a problem with this thinking, and does it make sense to
attempt this for 7.2? I am looking into the function manager stuff to
see what would be involved.

-- 
http://www.mohawksoft.com


Re: Three types of functions, ala function redux.

From
Stephan Szabo
Date:
[I was having trouble with the direct address so i'm only sending to
the list]

> select * from table where col = function() ;

> (2) "function()" returns a number of values that are independent of the
> query. Postgres should be able to optimize this to be: "select * from
> table where col in (val1, val2, val3, ..valn)." I guess Postgres can
> loop until done, using the isDone flag?

I disagree here.  I really don't think that changing = to mean "in"
in the system is a good idea.  If the user wants an in they should 
specify it.
I think "select * from table where col in (select function());" or
"select * from table where col in (select * from function());" or
even "select * from table where col in function();"
are better ways of specifying this sort of behavior.

If we do that (col = <function returning set>) meaning in, then does
col = (select statement that returns multiple rows) mean in and what
about col = <array>?  I think doing it only for the function case is
a mistake.



Re: Three types of functions, ala function redux.

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> There are three basic types of SQL behaviors that should be able to be
> performed.

> (1) "function()" returns a single value. Postgres should be able to
> understand how to optimize this to be: "select * from table where col =
> value" where value is the datum returned by function.

You get this now if the function is marked proiscachable.

> (2) "function()" returns a number of values that are independent of the
> query. Postgres should be able to optimize this to be: "select * from
> table where col in (val1, val2, val3, ..valn)." I guess Postgres can
> loop until done, using the isDone flag?

I object to the notion that "scalar = set" should be automatically
transformed into "scalar IN set".  It would be nice to be smarter about
optimizing IN operations where the subselect only returns a few rows
into multiple indexscans, but how should the planner know that in advance?

> (3) "function()" returns a value based on the query. (This seems to be
> how it currently functions.) where "select * from table where col =
> function()" will end up doing a full table scan. 

You get this now if the function is not marked proiscachable.
        regards, tom lane


Re: Three types of functions, ala function redux.

From
mlw
Date:
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > There are three basic types of SQL behaviors that should be able to be
> > performed.
> 
> > (1) "function()" returns a single value. Postgres should be able to
> > understand how to optimize this to be: "select * from table where col =
> > value" where value is the datum returned by function.
> 
> You get this now if the function is marked proiscachable.

Doh! RTFM!

> 
> > (2) "function()" returns a number of values that are independent of the
> > query. Postgres should be able to optimize this to be: "select * from
> > table where col in (val1, val2, val3, ..valn)." I guess Postgres can
> > loop until done, using the isDone flag?
> 
> I object to the notion that "scalar = set" should be automatically
> transformed into "scalar IN set".  It would be nice to be smarter about
> optimizing IN operations where the subselect only returns a few rows
> into multiple indexscans, but how should the planner know that in advance?

That is sort of my point. If one marks a function as "Iscachable" and
returns an isDone as false, will postgres keep calling until all values
have been returned, and then use an index scan with the finite (cached?)
set of results?

If so, this is exactly what I need.

> 
> > (3) "function()" returns a value based on the query. (This seems to be
> > how it currently functions.) where "select * from table where col =
> > function()" will end up doing a full table scan.
> 
> You get this now if the function is not marked proiscachable.

A lot of my confusion has cleared, the "iscachable" flag is an
enlightenment. Boy am I schmuck. ;-}

-- 
http://www.mohawksoft.com