Thread: get_relation_stats_hook()
Currently we have a plugin capability for get_relation_info_hook(), but no corresponding capability for statistics info. So, all calls to SearchSysCache would be replaced with a call to get_relation_info_hook(), if present. Any objections, thoughts? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > Currently we have a plugin capability for get_relation_info_hook(), but > no corresponding capability for statistics info. > > So, all calls to SearchSysCache would be replaced with a call to > get_relation_info_hook(), if present. I assume you meant get_relation_stats_hook in the last paragraph, but I can't understand what you mean with SearchSysCache (surely that's not it.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Simon Riggs <simon@2ndquadrant.com> writes: > Currently we have a plugin capability for get_relation_info_hook(), but > no corresponding capability for statistics info. > So, all calls to SearchSysCache would be replaced with a call to > get_relation_info_hook(), if present. Surely you didn't mean ALL calls. Please be more specific about what you're proposing. regards, tom lane
On Thu, 2008-06-26 at 11:18 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Currently we have a plugin capability for get_relation_info_hook(), but > > no corresponding capability for statistics info. > > > So, all calls to SearchSysCache would be replaced with a call to > > get_relation_info_hook(), if present. > > Surely you didn't mean ALL calls. Please be more specific about what > you're proposing. The statistics relation STATRELATT is accessed in a few places in the planner. Since it is in the syscache it is accessed directly from there. I would like to add hooks so that stats data can come from somewhere else other than the syscache for tables, just as we can already do with get_relation_stats_hook(). These new changes would complete the existing feature to ensure it is fully usabled in the way originally intended. In selfunc.c: There are 3 calls to SearchSysCache(STATRELATT,...). In lsyscache.c: There is 1 call to SearchSysCache(STATRELATT...) in get_attavgwidth() and 2 calls to SysCacheGetAttr(STATRELATT...) in get_attstatsslot(). Calls to SearchSysCache(STATRELATT...) would be replaced by a call to an external module, if present, with a function pointer to get_relation_stats_hook(). This returns a tuple with stats info about that relation. Calls to SysCacheGetAttr(STATRELATT...) would be replaced by a call to an external module, if present with a function pointer to get_attribute_stats_hook(). This returns a stats slot. The call in ANALYZE would not be touched. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndquadrant.com> writes: >> Surely you didn't mean ALL calls. Please be more specific about what >> you're proposing. > The statistics relation STATRELATT is accessed in a few places in the > planner. Since it is in the syscache it is accessed directly from there. > I would like to add hooks so that stats data can come from somewhere > else other than the syscache for tables, just as we can already do with > get_relation_stats_hook(). Well, defining the hooks as replacing STATRELATT lookups seems the wrong level of abstraction. What if you want to insert stats that can't be represented by the current pg_statistic definition? Even if there's no functional limitation, cons'ing up a dummy pg_statistic tuple seems the hard way to do it --- we didn't define get_relation_info_hook as working by supplying made-up catalog rows. Furthermore, many of the interesting cases that someone might want to hook into are code paths that don't even try to look up a pg_statistic row because they know there won't be one, such as two out of the three cases in examine_variable(). I think you need to move up a level, and perhaps refactor some of the existing code to make it easier to inject made-up stats. regards, tom lane
On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > >> Surely you didn't mean ALL calls. Please be more specific about what > >> you're proposing. > > > The statistics relation STATRELATT is accessed in a few places in the > > planner. Since it is in the syscache it is accessed directly from there. > > I would like to add hooks so that stats data can come from somewhere > > else other than the syscache for tables, just as we can already do with > > get_relation_stats_hook(). > > Well, defining the hooks as replacing STATRELATT lookups seems the wrong > level of abstraction. What if you want to insert stats that can't be > represented by the current pg_statistic definition? Even if there's no > functional limitation, cons'ing up a dummy pg_statistic tuple seems the > hard way to do it --- we didn't define get_relation_info_hook as working > by supplying made-up catalog rows. Furthermore, many of the interesting > cases that someone might want to hook into are code paths that don't > even try to look up a pg_statistic row because they know there won't be > one, such as two out of the three cases in examine_variable(). The reason for doing it this way is I'm interested in using stats literally copied from other servers. So the pg_statistic tuples will be available for us directly. I'm building a tool to allow people to export their production environment to a test system, so that SQL developers can experiment with query tuning and optimizer developers can recreate problems. > I think you need to move up a level, and perhaps refactor some of the > existing code to make it easier to inject made-up stats. Both sound like good ideas. I wasn't really after ultimate flexibility, but perhaps I should be. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hm, I assume we want to be able to turn on and off plugins in a running session? I think the "free_using_plugin" flag: ! if (get_relation_stats_hook) ! vardata->statsTuple = (*get_relation_stats_hook) ! (rte->relid, ! var->varattno); ! ! if (vardata->statsTuple) ! vardata->free_using_plugin = true; ! else ! vardata->statsTuple = SearchSysCache(STATRELATT, is insufficient to handle this. vardata->free_using_plugin could be true but by the time the variable is released the plugin pointer could have been cleared. Or worse, set to a different plugin. The easiest way to fix this seems like also the best way, instead of storing a boolean store the pointer to the release function. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote: > The easiest way to fix this seems like also the best way, instead of storing a > boolean store the pointer to the release function. OK, I like that better anyhow. Hadn't thought about turning plugin off, but I can see the requirement now your raise it. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote: > The easiest way Did you have further review comments? If so, I'll wait for those before making further mods. Thanks for ones so far. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote: > The easiest way to fix this seems like also the best way, instead of storing a > boolean store the pointer to the release function. Implemented as suggested. v5 enclosed. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support