Re: Function execution costs 'n all that - Mailing list pgsql-hackers
From | Mark Cave-Ayland |
---|---|
Subject | Re: Function execution costs 'n all that |
Date | |
Msg-id | 1168935792.5652.46.camel@mca-desktop Whole thread Raw |
In response to | Re: Function execution costs 'n all that (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote: > Brian Hurt <bhurt@janestcapital.com> writes: > > Non-developer here, but we use a lot of plpgsql functions at work. And > > the functions we use fall into two broad, ill-defined catagories- > > "expensive" functions and "cheap" functions. What I'd like as a user is > > some way to tell the planner "this function is expensive- prefer plans > > which call this function less even if they're otherwise more expensive" > > or "this function is cheap, prefer plans that are otherwise less > > expensive even if they call this function more often". Precise cost > > estimates aren't that important, IMHO. > > Right, so a plain constant cost would be plenty for your situation. > > I suspect there's an 80/20 rule at work here --- the estimator-function > side of this will take most of the effort to design/implement, but not > get used nearly as much as the plain-constant form ... maybe we should > just do the constant for starters and see how many people really want to > write C-code estimators ... > > regards, tom lane Hi Tom et al, Having worked with stored procedures on large datasets for reporting, I would say that it would be useful to have a non-constant estimator for the number of rows, whereas a single CPU cost constant should be fine. Where I have struggled with this has been joining onto slightly more exotic queries when doing large scale data processing as part of a custom report or an application upgrade. Using PL/PGSQL I would find it useful to have access to the constants passed into a function to be used to help provide a row count estimate (typically useful for passing in table/column names), e.g. SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id = t2.id; CREATE FUNCTION my_func(text) AS $$ ... $$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost; In my cost function, I could then estimate the number of rows using something like below, where all constants are passed into the cost function as parameters, e.g.: CREATE FUNCTION my_func_row_cost(text) AS $$ DECLAREfoo bigint; BEGINEXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1);RETURN foo; END; $$ LANGUAGE 'plpgsql'; In the case where a parameter was not a constant but a column name, then it would be reasonable in my mind to simply replace that parameter with NULL when passing to the row cost function, e.g. SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT my_func(my_table1.name)); In this case, the text parameter passed to my_func_row_cost would be replaced by NULL to indicate that it was non-constant. Of course, even with constants passed upon input, it still may not be possible to calculate a number of rows that can be returned - it could be the case that the only parameter passed to cost function has been converted to NULL because it is a column name. Perhaps in this case it would be useful to specify returning NULL from my_func_row_cost means "I can't return anything meaningful, so use the fallback values". Kind regards, Mark.
pgsql-hackers by date: