Thread: Advice on setting cost for function

Advice on setting cost for function

From
guyren@relevantlogic.com
Date:
The documentation in CREATE FUNCTION is fine as far is it goes regarding the COST setting, but that isn’t very far, and I haven’t had any luck finding good advice.

How do I determine what value to set as the COST of a function? I understand that it’s a relative, abstract, per-row sort of thing, but that’s not quite enough to go on to do it well.

If I have a function that looks up a single value from an index, should that be lower than 100? 20, say?

If I have a function that calls multiple other functions, each of which does a reasonable amount of work, should I set the caller to a higher COST, or will Postgres use the costs of the functions it calls?

I can imagine that a SQL function which is just inlined might ignore the COST estimate. Is that so? Are there other such considerations?

In general, a section in the CREATE FUNCTION documentation of two or three paragraphs with a few examples and general guidelines would be very helpful. I would be happy to write it if someone will explain it to me.

Re: Advice on setting cost for function

From
Tom Lane
Date:
guyren@relevantlogic.com writes:
> The documentation in CREATE FUNCTION is fine as far is it goes regarding the COST setting, but that isn’t very far,
andI haven’t had any luck finding good advice. 

Fair complaint.  A quick data dump:

procost = 1 is supposed to represent the cost of a simple built-in
function --- think addition or comparison.  Expensive C functions,
such as tsvector parsing, are supposed to have procosts like 10 or 100.
Keep in mind that procost is actually scaled by cpu_operator_cost
(0.0025 typically) before being folded into a plan cost estimate.

PL functions are inevitably going to be way more expensive than a simple
built-in function.  It's likely that the default procost setting of 100
is a drastic underestimate in most cases, and that a saner default might
be more like 1000.  I'm hesitant to change that for fear of causing
surprising plan changes, though.

Both the CREATE FUNCTION man page and the pg_proc catalog documentation
claim that for a set-returning function, procost is per-output-row.
That may have been true when written but it seems to be horsepucky now;
it's only charged once regardless.  We should change the docs.

> If I have a function that looks up a single value from an index, should that be lower than 100? 20, say?

Well, if you accept the planner's default estimate that a single-row
indexed lookup costs around 8 cost units (twice random_page_cost),
then a function embodying that should also cost that much; dividing by
cpu_operator_cost leads to the conclusion that its procost should be
3200.  There are lots of reasons to be more optimistic than that, but
for sure I wouldn't think that a function that embodies database access
should have procost less than 100.

> I can imagine that a SQL function which is just inlined might ignore the COST estimate. Is that so? Are there other
suchconsiderations? 

Yeah, once it gets inlined its procost marking is no longer a factor;
we'll look at the substituted expression instead.

> If I have a function that calls multiple other functions, each of which does a reasonable amount of work, should I
setthe caller to a higher COST, or will Postgres use the costs of the functions it calls? 

Nope, you'll need to adjust the cost of the calling function.
Except for the inlined case, PG will just take the cost marking
at face value.

I believe that the PostGIS people just recently increased the
cost markings on all their expensive functions to better reflect
reality.  You might want to go dig in their git repo to see what
they did (I don't think those changes are released yet).

> In general, a section in the CREATE FUNCTION documentation of two or three paragraphs with a few examples and general
guidelineswould be very helpful. I would be happy to write it if someone will explain it to me. 

Have at it ...

            regards, tom lane