Re: Functionscan estimates - Mailing list pgsql-performance

From Neil Conway
Subject Re: Functionscan estimates
Date
Msg-id 4258BD20.7090907@samurai.com
Whole thread Raw
In response to Re: Functionscan estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Functionscan estimates
List pgsql-performance
Tom Lane wrote:
> The larger point is that writing an estimator for an SRF is frequently a
> task about as difficult as writing the SRF itself

True, although I think this doesn't necessarily kill the idea. If
writing an estimator for a given SRF is too difficult, the user is no
worse off than they are today. Hopefully there would be a fairly large
class of SRFs for which writing an estimator would be relatively simple,
and result in improved planner behavior.

> I don't foresee a whole lot of use of an estimator hook designed as
> proposed here.  In particular, if the API is such that we can only
> use the estimator when all the function arguments are plan-time
> constants, it's not going to be very helpful.

Yes :( One approach might be to break the function's domain into pieces
and have the estimator function calculate the estimated result set size
for each piece. So, given a trivial function like:

foo(int):
    if $1 < 10 then produce 100 rows
    else produce 10000 rows

If the planner has encoded the distribution of input tuples to the
function as a histogram, it could invoke the SRF's estimator function
for the boundary values of each histogram bucket, and use that to get an
idea of the function's likely result set size at runtime.

And yes, the idea as sketched is totally unworkable :) For one thing,
the difficulty of doing this grows rapidly as the number of arguments to
the function increases. But perhaps there is some variant of this idea
that might work...

Another thought is that the estimator could provide information on the
cost of evaluating the function, the number of tuples produced by the
function, and even the distribution of those tuples.

BTW, why is this on -performance? It should be on -hackers.

-Neil

pgsql-performance by date:

Previous
From: Neil Conway
Date:
Subject: Re: Functionscan estimates
Next
From: John A Meinel
Date:
Subject: Re: performance - triggers, row existence etc.