Re: Function execution costs 'n all that - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject Re: Function execution costs 'n all that
Date
Msg-id 1168935792.5652.46.camel@mca-desktop
Whole thread Raw
In response to Re: Function execution costs 'n all that  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
> Brian Hurt <bhurt@janestcapital.com> writes:
> > Non-developer here, but we use a lot of plpgsql functions at work.  And 
> > the functions we use fall into two broad, ill-defined catagories- 
> > "expensive" functions and "cheap" functions.  What I'd like as a user is 
> > some way to tell the planner "this function is expensive- prefer plans 
> > which call this function less even if they're otherwise more expensive" 
> > or "this function is cheap, prefer plans that are otherwise less 
> > expensive even if they call this function more often".  Precise cost 
> > estimates aren't that important, IMHO.
> 
> Right, so a plain constant cost would be plenty for your situation.
> 
> I suspect there's an 80/20 rule at work here --- the estimator-function
> side of this will take most of the effort to design/implement, but not
> get used nearly as much as the plain-constant form ... maybe we should
> just do the constant for starters and see how many people really want to
> write C-code estimators ...
> 
>             regards, tom lane


Hi Tom et al,

Having worked with stored procedures on large datasets for reporting, I
would say that it would be useful to have a non-constant estimator for
the number of rows, whereas a single CPU cost constant should be fine.
Where I have struggled with this has been joining onto slightly more
exotic queries when doing large scale data processing as part of a
custom report or an application upgrade.

Using PL/PGSQL I would find it useful to have access to the constants
passed into a function to be used to help provide a row count estimate
(typically useful for passing in table/column names), e.g.

SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id =
t2.id;


CREATE FUNCTION my_func(text) AS $$
...
$$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost;


In my cost function, I could then estimate the number of rows using
something like below, where all constants are passed into the cost
function as parameters, e.g.:


CREATE FUNCTION my_func_row_cost(text) AS $$
DECLAREfoo bigint;
BEGINEXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1);RETURN foo;
END;
$$ LANGUAGE 'plpgsql';


In the case where a parameter was not a constant but a column name, then
it would be reasonable in my mind to simply replace that parameter with
NULL when passing to the row cost function, e.g.


SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT
my_func(my_table1.name));


In this case, the text parameter passed to my_func_row_cost would be
replaced by NULL to indicate that it was non-constant.

Of course, even with constants passed upon input, it still may not be
possible to calculate a number of rows that can be returned - it could
be the case that the only parameter passed to cost function has been
converted to NULL because it is a column name. Perhaps in this case it
would be useful to specify returning NULL from my_func_row_cost means "I
can't return anything meaningful, so use the fallback values".


Kind regards,

Mark.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Checkpoint request failed on version 8.2.1.
Next
From: Magnus Hagander
Date:
Subject: Re: [GENERAL] Checkpoint request failed on version 8.2.1.