Thread: Should planner fold "stable" functions for estimation purposes?

Should planner fold "stable" functions for estimation purposes?

From
Tom Lane
Date:
I've been toying with the notion of allowing the planner to compute the
current values of "stable" functions when it's trying to estimate
selectivities.  For instance, in a query like
select ... where timestampcol >= now() - interval '1 day';

we currently throw up our hands and treat the righthand side as an
unknown quantity for estimation purposes, which leads to selection of
a very conservative default selectivity estimate.  That often
discourages the planner from selecting an indexscan, and can lead to
unreasonably slow join choices at upper levels of the plan.

It would not be correct to reduce the righthand side to a constant in
advance of execution, of course, but is it reasonable to compute its
current value solely for purposes of comparison to column statistics?

The risk we take if we do so is that the estimate we thereby derive
could be stale by the time the generated plan is used, and in the worst
case the plan could be really inappropriate.  On the other hand, in most
of the practical examples that I've seen, the current planner behavior
is producing a pretty inappropriate plan.

A possibly useful compromise is to do this reduction only in
scalarineqsel, where not having any comparison value is really a serious
blow, and not risk it in eqsel, where we can often generate a not-too-awful
estimate without any specific comparison value.

Comments?
        regards, tom lane


Re: Should planner fold "stable" functions for estimation purposes?

From
Tom Lane
Date:
Rod Taylor <rbt@rbt.ca> writes:
>> It would not be correct to reduce the righthand side to a constant in
>> advance of execution, of course, but is it reasonable to compute its
>> current value solely for purposes of comparison to column statistics?

> So this means it would be double evaluated? A flag will be required to
> prevent this for functions that do more than just return a value or have
> a high cost in execution.

Functions with side-effects had better be marked volatile anyway, so I'm
not worried about that case.  As for the expense argument, keep in mind
that the one extra evaluation in the planner is likely to save you an
awful lot of evaluations at runtime, if it convinces the planner to use
an indexscan and not a seqscan.  We are after all talking about
functions appearing in WHERE, and I wouldn't think that people can
reasonably expect those to get evaluated just once.
        regards, tom lane