Mark,
> This would only seem to work for trivial functions. Most functions that
> I write are themselves dependent on underlying tables, and without any
> idea how many rows are in the tables, and without any idea of the
> statistical distribution of those rows, I can't really say anything like
> "average rows returned = 5".
>
> What I have wanted for some time is a function pairing system. For each
> set returning function F() I create, I would have the option of creating
> a statistics function S() which returns a single integer which
> represents the guess of how many rows will be returned. S() would be
> called by the planner, and the return value of S() would be used to
> decide the plan. S() would need access to the table statistics
> information. I imagine that the system would want to prevent S() from
> running queries, and only allow it to call certain defined table
> statistics functions and some internal math functions, thereby avoiding
> any infinite recursion in the planner. (If S() ran any queries, those
> queries would go yet again to the planner, and on down the infinite
> recursion you might go.)
>
> Of course, some (possibly most) people could chose not to write an S()
> for their F(), and the default of 1000 rows would continue to be used.
> As such, this new extension to the system would be backwards compatible
> to functions which don't have an S() defined.
I think this is a fine idea, and I think I endorsed it the first time.
However, even a static "function returns #" would be better than what we
have now, and I think the S() method could take quite a bit of engineering
to work out (for example, what if F() is being called in a JOIN or
correlated subquery?). So I'm worried that shooting for the S() idea only
could result in us not doing *anything* for several more versions.
What I'd like to do is implement the constant method for 8.2, and work on
doing the S() method later on. Does that make sense?
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco