Thread: query optimization with UDFs
Hello Everybody,
I have a question about optimization of queries which includes UDFs.
Does anybody know what the Postgres does for optimizing the queries with UDFs?
Does the Postgres query optimizer do anything special with UDFs?
Thanks,
----
Jungmin Shin
Jungmin Shin
On Mon, 2006-10-09 at 22:49 -0400, jungmin shin wrote: > Does anybody know what the Postgres does for optimizing the queries > with UDFs? The optimizer considers function volatility to avoid reevaluating UDFs needlessly, and to use index scans on predicates involving a function. Also, functions defined in the SQL language will be inlined into the function call site, when possible. That's all that comes to mind at the moment... Notably, the optimizer doesn't have support for realistic costing of UDFs: it can't tell how expensive evaluating a UDF is going to be, nor the number and distribution of the rows that will be produced by a set-returning function. The Berkeley release of Postgres supported "expensive function optimization" (a.k.a xfunc), which was an implementation of Hellerstein and Stonebraker's work on "Predicate Migration"[1]. That code is no longer in the current Postgres source. BTW, I think it would make sense to implement a limited subset of the xfunc ideas: add options to CREATE FUNCTION to allow cost information to be specified, and then take advantage of this information instead of using the existing constant kludges. This would be a tangible improvement, and would have minimal impact on the planner. A further improvement would be predicate migration: the xfunc code modified the optimizer to consider pulling predicates up above joins in the query plan. The traditional assumption is that all predicates are cheap, so they are always pushed down in the hopes that this will reduce the size of intermediate result sets. Naturally, this is not always wise with expensive UDFs. After the basics have been done, it might be interesting to implement this, provided it can be done without increasing the planner's search space too much. -Neil [1] http://db.cs.berkeley.edu/jmh/miscpapers/sigmod93.pdf
Neil Conway <neilc@samurai.com> writes: > BTW, I think it would make sense to implement a limited subset of the > xfunc ideas: add options to CREATE FUNCTION to allow cost information to > be specified, and then take advantage of this information instead of > using the existing constant kludges. This would be a tangible > improvement, and would have minimal impact on the planner. The trick is to figure out what a useful parameterized cost model would look like. IIRC, the main reason the xfunc code rotted on the vine was that its cost parameters didn't seem to be either easy to select or powerful in predicting actual cost. We'd have to do better this time. regards, tom lane
Hi, Tom, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: >> BTW, I think it would make sense to implement a limited subset of the >> xfunc ideas: add options to CREATE FUNCTION to allow cost information to >> be specified, and then take advantage of this information instead of >> using the existing constant kludges. This would be a tangible >> improvement, and would have minimal impact on the planner. > > The trick is to figure out what a useful parameterized cost model would > look like. IIRC, the main reason the xfunc code rotted on the vine was > that its cost parameters didn't seem to be either easy to select or > powerful in predicting actual cost. We'd have to do better this time. I don't know what the xfunc people did, but at least for some varlen data types (Arrays, PostGIS, text), some function costs (concatenation, GeomUnion etc.) can be estimated via the average field size of the tables. Has that idea been considered? HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Markus Schaber <schabi@logix-tt.com> writes: > Tom Lane wrote: >> The trick is to figure out what a useful parameterized cost model would >> look like. IIRC, the main reason the xfunc code rotted on the vine was >> that its cost parameters didn't seem to be either easy to select or >> powerful in predicting actual cost. We'd have to do better this time. > I don't know what the xfunc people did, but at least for some varlen > data types (Arrays, PostGIS, text), some function costs (concatenation, > GeomUnion etc.) can be estimated via the average field size of the tables > Has that idea been considered? [ shrug... ] Concatenation is definitely not one of the functions we need to worry about. In fact, I'd say that only functions containing database accesses are really worth worrying about --- and the hard part of modeling them is the possible variation in the number of rows touched depending on their parameters. regards, tom lane